Probability Distributions

USING SPSS FOR SIMULATIONS OF SPECIAL DISTRIBUTIONS

SPSS will generate simulated “results” from binomial, poisson or normal distributions providing you specify the necessary parameters (e.g. mean, standard deviation etc.).

HOW TO PREPARE A COLUMN FOR SIMULATED RESULTS

At the top of the column you wish to use :-

 Double click on grey box
 Variable name = “heads”
 Click on change settings TYPE
 Change decimal places from 2 to 0
 Click continue
 OK

Next establish your column length

Decide on HOW MANY simulated results you require( 30? 50 ? 100? 500? ). For 30 results use the cursor keys to go down to the 30th cell and type in a number (any number). If your column length is to be 50- do this to cell number 50 and so on. (This is not your group size (n) but the number of times the experiment is to be repeated.)

SIMULATED RESULTS FROM A BINOMIAL DISTRIBUTION

Follow the instructions on preparing a column (see above).
The instructions here will produce simulated results for B(10, 0.5). The number appearing in each cell represents the number of heads appearing if 10 coins are tossed.

 Click on Transform then Compute
 In box with Target Variable - type your column name (e.g. HEADS).
 In Function Box (Right hand side) Scroll down to RV,BINOM [n,p]
 Highlight and click on arrow to get it into top box (Numeric Expression)
 Type in group size n (e.g. 10) in place of first question mark
 Type in p (e.g. 0.5) in place of second question mark.
 Click on OK
 Change existing variable - OK

The number in each cell represent for each group (size n =10), the number of successes (or heads) obtained.
 

TO DRAW A BAR GRAPH

 Click on graphs
 Bar graph - simple - define
 Enter variable name (= heads) as category axis
 Use statistics menu to find mean and standard deviation

 

INVESTIGATION

 

INCREASE THE NUMBER OF TRIALS

 (Click on window, choose data file if you are in the Chart file)
 Enter a number in cell no.100 in column 1 in order to generate 100 results. (you may need to use the  cursor keys) OR start a new column in column 2.
 Click on transform
 Compute
 OK
 Change existing variable - OK

Look at a new bar graph for larger numbers of trials.


 
What do you notice?

What are the mean and standard deviation this time?

Is the mean closer to “np” than it was before?

Is the standard deviation closer to “npq”?

Investigate what happens if you increase the number of trials again (to say 500 ).
 

FURTHER INVESTIGATIONS WITH BINOMIAL DATA

1.)  Keep the number of trials the same (column length) but change the value of p (e.g. 0.1, 0.2, 0.3, 0.4 etc.).

2.)  Compare p = 0.1 with p = 0.9

3.)  What happens if you keep p the same but change n?
 

NOTE that the bar graphs produced by SPSS here will not take account of values of x with zero frequencies.
 

TO GENERATE SIMULATED RESULTS FROM A POISSON DISTRIBUTION

Follow the instructions above on “how to prepare a column”. The length of  your column will represent the number of samples. The result appearing in each cell will be the result for each sample. e.g. No. of arrivals in a one minute interval, or No. of fish in a sample of 10 litres of water.

Click on Transform

Draw a bar graph of your simulated results

Use the Statistics menu to find the mean and standard deviation.

Use your calculator to square the value of the standard deviation you obtained.

Why would you expect this answer to be similar in value to the mean ?

Try increasing your column length.
Does increasing the number of samples (i.e. the column length) give values of the mean and variance which are closer in value ?
Why might this happen ?

COMPARING BINOMIAL AND POISSON DISTRIBUTIONS

 The poisson distribution was often used as an approximation to the binomial distribution, as it is easier to calculate the probabilities (if you have no calculator or computer )

Just how close are the distributions ?

Generate 100 simulated results from a binomial distribution..

n = 10  p =

Draw a bar graph of your results.

Now generate 100 simulated results from the poisson distribution with mean = 5.

Draw a bar graph of these results. How similar are your two graphs ?

Now Compare

 B(10, 0.1)  with  P (1)
 B(100, 0.1)  with P (10)
 B(100, 0.01) with p (1)

Which pair of graphs were closest ?

SIMULATED RESULTS FROM A NORMAL DISTRIBUTION

Follow the instructions above on “how to prepare a column” but DO NOT change the number of decimal places.

Click on Transform

Draw a histogram of your simulated results.

COMPARING EXPERIMENTAL RESULTS TO A NORMAL DISTRIBUTION

You may wish to compare results from an experiment or data collection exercise with a normal distribution (with the same mean and standard deviation ).

Click on graphs


 

Here are the times taken by 48 students to complete a pencil and paper maze. The normal distribution does not look like a good model for this data. But it can be quite difficult to make any decision on the basis of this kind of graph, particularly for small samples.

TESTING WHETHER THE NORMAL DISTRIBUTION IS A SUITABLE MODEL FOR DATA

A special kind of graph paper called “normal probability paper” can be used to test whether data has come from a population which could be modelled by a normal distribution.
 The normal PP plot produced by SPSS is similar to (but not quite the same as) the plots drawn on normal probability paper.
 A normal PP plot plots the cumulative proportions of your results of your results on the horizontal axis. On the vertical axis it plots the cumulative proportions which would be obtained from a normal distribution with the same mean and standard deviation.
 Cumulative proportions are found by
    Cumulative Frequency
   Total Frequency

If the normal distribution is a suitable model for your data, the points plotted should be close to a straight line. You expect a closer fit for large samples than for small samples.
( A significance test can be obtained by clicking on Statistics - Summarise - Explore.
         Click on grey statistics box and ask for Normality plots with tests
However, the tests used are NOT ones covered on the A level statistics syllabus.)

EXAMPLE
         The pulse rates for a group of students were taken (results in beats per minute). Is the normal distribution a suitable model for this variable ?

Pulse Rates
     54     54     56     57     62     63     64     64     66
     68     69     69     71     73     76


 
 
Click on graphs

As the points are quite close to the straight line (bearing in mind that the sample size is small) they could come from a normal distribution.
         Next we show a normal Distribution PP Plot for the times taken by 48 students to complete a paper and pencil maze (used in previous section) . Note that the points are not close to a straight line.

THE SIGN TEST

 A coin was tossed 20 times. Heads were recorded as “1” , tails as “2”. There were 12 heads and 8 tails. We can test whether the coin is biased by using the sign test (or binomial test).
             The null hypothesis is that the coin is fair and that p = . SPSS will calculate the probability of 12 heads and 8 tails for B(20, ).

 Click on Statistics

 

This shows that the probabilityof obtaining twelve heads or more (or twelve tails or more) from a fair coin is 0.5034. Therefore we must retain the null hypothesis and conclude that the coin does not appear to be biased. Only if the “Exact Binomial 2 - tailed P” is less than 0.05, can we reject the null hypothesis and conclude that the coin is biased.