Probability Distributions
USING SPSS FOR SIMULATIONS OF SPECIAL DISTRIBUTIONS
SPSS will generate simulated results from binomial, poisson or normal
distributions providing you specify the necessary parameters (e.g. mean,
standard deviation etc.).
HOW TO PREPARE A COLUMN FOR SIMULATED RESULTS
At the top of the column you wish to use :-
Double click on grey box
Variable name = heads
Click on change settings TYPE
Change decimal places from 2 to 0
Click continue
OK
Next establish your column length
Decide on HOW MANY simulated results you require( 30? 50 ? 100?
500? ). For 30 results use the cursor keys to go down to the 30th cell
and type in a number (any number). If your column length is to be 50- do
this to cell number 50 and so on. (This is not your group size (n) but
the number of times the experiment is to be repeated.)
SIMULATED RESULTS FROM A BINOMIAL DISTRIBUTION
Follow the instructions on preparing a column (see above).
The instructions here will produce simulated results for B(10, 0.5).
The number appearing in each cell represents the number of heads appearing
if 10 coins are tossed.
Click on Transform then Compute
In box with Target Variable - type your
column name (e.g. HEADS).
In Function Box (Right hand side) Scroll
down to RV,BINOM [n,p]
Highlight and click on arrow to get
it into top box (Numeric Expression)
Type in group size n (e.g. 10) in place
of first question mark
Type in p (e.g. 0.5) in place of second
question mark.
Click on OK
Change existing variable - OK
The number in each cell represent for each group (size n =10), the number
of successes (or heads) obtained.
Click on graphs
Bar graph - simple - define
Enter variable name (= heads) as category
axis
Use statistics menu to find mean and
standard deviation
INVESTIGATION
INCREASE THE NUMBER OF TRIALS
(Click on window, choose data file if you
are in the Chart file)
Enter a number in cell no.100 in column
1 in order to generate 100 results. (you may need to use the cursor
keys) OR start a new column in column 2.
Click on transform
Compute
OK
Change existing variable - OK
Look at a new bar graph for larger numbers of trials.
What do you notice?
What are the mean and standard deviation this time?
Is the mean closer to np than it was before?
Is the standard deviation closer to npq?
Investigate what happens if you increase the number of trials again
(to say 500 ).
FURTHER INVESTIGATIONS WITH BINOMIAL DATA
1.) Keep the number of trials the same (column length) but change
the value of p (e.g. 0.1, 0.2, 0.3, 0.4 etc.).
2.) Compare p = 0.1 with p = 0.9
3.) What happens if you keep p the same but change n?
NOTE that the bar graphs produced by SPSS here will not take account
of values of x with zero frequencies.
TO GENERATE SIMULATED RESULTS FROM A POISSON DISTRIBUTION
Follow the instructions above on how to prepare a column. The length
of your column will represent the number of samples. The result appearing
in each cell will be the result for each sample. e.g. No. of arrivals in
a one minute interval, or No. of fish in a sample of 10 litres of water.
Click on Transform
Then Compute
In box for Target Variable type
Insert your column name (e.g. Fish)
In Function box on right hand side scroll
down to
Highlight and click on arrow to get it into the
top box (Numeric Expression)
Enter the mean (e.g. 5 )
Click on OK
Change existing variable - OK
Draw a bar graph of your simulated results
Use the Statistics menu to find the mean and standard deviation.
Use your calculator to square the value of the standard deviation
you
obtained.
Why would you expect this answer to be similar in value to the mean
?
Try increasing your column length.
Does increasing the number of samples (i.e. the column length) give
values of the mean and variance which are closer in value ?
Why might this happen ?
COMPARING BINOMIAL AND POISSON DISTRIBUTIONS
The poisson distribution was often used as an approximation to the
binomial distribution, as it is easier to calculate the probabilities (if
you have no calculator or computer )
Just how close are the distributions ?
Generate 100 simulated results from a binomial distribution..
n = 10 p = ½
Draw a bar graph of your results.
Now generate 100 simulated results from the poisson distribution with
mean = 5.
Draw a bar graph of these results. How similar are your two graphs ?
Now Compare
B(10, 0.1) with P (1)
B(100, 0.1) with P (10)
B(100, 0.01) with p (1)
Which pair of graphs were closest ?
SIMULATED RESULTS FROM A NORMAL DISTRIBUTION
Follow the instructions above on how to prepare a column but DO NOT change
the number of decimal places.
Click on Transform
then Compute
In the box for target variable type in the
column name you have used
Scroll down the function box on the right
hand side and select RV.NORMAL.
Highlight and click on arrow to get it into
the top box (Numeric Expression).
Enter the mean and standard deviation
Click on OK
Change existing variable - OK
Draw a histogram of your simulated results.
COMPARING EXPERIMENTAL RESULTS TO A NORMAL DISTRIBUTION
You may wish to compare results from an experiment or data collection exercise
with a normal distribution (with the same mean and standard deviation ).
Click on graphs
Choose Histogram
Enter your variable
Click on display normal curve
OK
Here are the times taken by 48 students to complete a pencil and paper
maze. The normal distribution does not look like a good model for this
data. But it can be quite difficult to make any decision on the basis of
this kind of graph, particularly for small samples.
TESTING WHETHER THE NORMAL DISTRIBUTION IS A SUITABLE MODEL FOR DATA
A special kind of graph paper called normal probability paper can be
used to test whether data has come from a population which could be modelled
by a normal distribution.
The normal PP plot produced by SPSS is similar to (but not quite
the same as) the plots drawn on normal probability paper.
A normal PP plot plots the cumulative proportions of your results
of your results on the horizontal axis. On the vertical axis it plots the
cumulative proportions which would be obtained from a normal distribution
with the same mean and standard deviation.
Cumulative proportions are found by
Cumulative Frequency
Total Frequency
If the normal distribution is a suitable model for your data, the points
plotted should be close to a straight line. You expect a closer fit for
large samples than for small samples.
( A significance test can be obtained by clicking on Statistics - Summarise
- Explore.
Click on grey statistics
box and ask for Normality plots with tests
However, the tests used are NOT ones covered on the A level statistics
syllabus.)
EXAMPLE
The pulse rates for
a group of students were taken (results in beats per minute). Is the normal
distribution a suitable model for this variable ?
Pulse Rates
54 54
56 57 62
63 64 64
66
68 69
69 71 73
76
Click on graphs
normal PP
select heart as your variable
OK
As the points are quite close to the straight line (bearing in mind that
the sample size is small) they could come from a normal distribution.
Next we show a normal
Distribution PP Plot for the times taken by 48 students to complete a paper
and pencil maze (used in previous section) . Note that the points are not
close to a straight line.
THE SIGN TEST
A coin was tossed 20 times. Heads were recorded as 1 , tails as
2. There were 12 heads and 8 tails. We can test whether the coin is biased
by using the sign test (or binomial test).
The null hypothesis is that the coin is fair and that p = ½ . SPSS
will calculate the probability of 12 heads and 8 tails for B(20, ½
).
Click on Statistics
Choose Nonparametric Tests
Click on Binomial
Put COIN into test variable list
Click on OPTIONS
Descriptive statistics
Continue
OK.
This shows that the probabilityof obtaining twelve heads or more (or
twelve tails or more) from a fair coin is 0.5034. Therefore we must retain
the null hypothesis and conclude that the coin does not appear to be biased.
Only if the Exact Binomial 2 - tailed P is less than 0.05, can we reject
the null hypothesis and conclude that the coin is biased.