Switch the screen on/off Go back to last page Go forward one page Find out more details about the advertisement
John Moores University. Main CWIS Site
  

Return to List of Functions 

Median Routine

Derive has numerous in-built statistical functions. At the time of writing, it does not have a median function. 

This is a very simple routine, which demonstrates some basic array (also known as vector) manipulation. Such vector manipulation is very important in many aspects of mathematical programming.

The three main methods of describing measures of central tendency of a group of data are: mean (average), median and mode. Derive already has an average() function. Other useful in-built Derive functions are sum, dim, sort, and sub

Suppose we have a one-dimensional matrix. This is also called a vector, or an array. For the purposes of this exercise, it does not matter whether it is a column or a row vector. However, measures of central tendency somewhat assume we are dealing with numbers. Take the vector: 

a:=[4, 5, -12, 1, -9, 20]

The mean, median and mode are intuitive. To find the mean (average), Derive simply adds up the elements, and divides by the number of elements. Hence,

AVERAGE(a) = 1.5

SUM(a) = 9

DIM(a) = 6

a SUB 3 =  -12  

Here, DIM(a) is the DIMENSION, or number of elements, of 'a'. Also, the 'SUB' command identifies the individual vector elements - a SUB 3 is the third element of vector a: in this case, -12.  

If the dimension of a vector is odd, the median of a set of data is the mid-value of the ordered vector. If the number of elements is even, we take the two mid-values of the sorted array, and find their average. To find the median, we:
sort the array;
count the number of elements;
calculate the position of the mid-point(s);
find the value at the mid-point. i.e. the median.

Such a function could be coded in the first instance as

median(a, n) :=
    Prog
        a:= SORT(a)
        n:= DIM(a)
        If ODD?(n)
           a™((n + 1)/2)
           (a™(n/2) + a™(n/2 + 1))/2)

As a one-line entry into the Derive author line:

median(a, n) := PROG(a := SORT(a), n := DIM(a), IF(ODD?(n), a™((n + 1)/2), (a™(n/2) + a™(n/2 + 1))/2))

Now let's break down the program to see how it works.
Line 1 of the PROG uses Derive's internal sort routine to sort the elements of the vector a into ascending order.
Line 2 determines the number of elements in the vector.
Line 3 is an IF function, the condition is
ODD?(n), the function ODD?(n) determines whether n is odd or even. If n is odd then true is returned  else false is returned. If n is indeed odd then the median is the (n+1)/2 element of the vector, other wise it is the average of the n/2 and the (n+1)/2 elements of the vector.   As the if statement is the last action of the PROG(), then its outcome is returned automatically (hence no need to put RETURN commands in the THEN, ELSE arguments). 

A more imaginative approach would be
median(a, m) := 
     Prog 
       a := SORT(a) 
       m := (DIM(a) + 1)/2 
       (a™FLOOR(m) + a™CEILING(m))/2

As a one-line entry into the Derive author line:

median(a, m) := PROG(a := SORT(a), m := (DIM(a) + 1)/2, (a™FLOOR(m) + a™CEILING(m))/2)

This version removes the need for an IF function in the program, but uses the FLOOR() and CEILING() Functions.  
The FLOOR(m) functions simplifies to the greatest integer less than or equal to m. If m is nonnegative, this is equivalent to the integer-part of m. For example
FLOOR(3.141)=3 but FLOOR(-3.141)=-4

The CEILING(m) function simplifies to the smallest integer greater than or equal to m. For example CEILING(3.141)=4 but CEILING(-3.141)=-3

or we could cunningly program the median with
median(a) := 
   Prog 
     a := SORT(a) 
     a :+ REVERSE(a) 
     a™CEILING(DIM(a), 2)/2

which is 
median(a) := PROG(a := SORT(a), a :+ REVERSE(a), a™CEILING(DIM(a), 2)/2)

in 1-D entry line format.

This version uses the REVERSE(V)  function which reverses the elements of the vector v, e.g. REVERSE([1,2,3])=[3,2,1] and UPDATE operators.  Also CEILING(DIM(a), 2) is exactly the same as CEILING(DIM(a)/2).

It may be a good idea to test if any of the elements of the vector are not numbers, a quick way to do this would be to insert the line IF(DIM(VARIABLES(a))>0, RETURN false) at the beginning.  The VARIABLES() command returns a vector of all the variables in a, so if there are no variables in a, i.e. all the elements are numbers, then DIM(VARIABLES(a))=0.  So if there should be variables or strings in a then DIM(VARIABLES(a))>0.