OJNI 

Statistics 1.3 and 1.4 Comparing Proportions 2.0
contact hours Course
Objectives At
the conclusion of this 2.0 unit course the learner will be able to: 1.
Describe why estimates will vary from sample to sample about the
population proportion. 2.
How does the standard error of an estimated proportion approximate the
population proportion.. 3.
Calculate the standard error of a proportion using the formula
provided. 4.
Describe the statistical test that is used to compare the proportion of
success in the control group with the proportion of success in the new
treatment. 5.
Describe the difference between a Type I error and a Type II error. Questions
of Current Interest Estimates
and Standard Errors For
example, members of a group who are homeless are difficult to include in a
count or members of a household may go intentionally unreported. There is also
some disagreement about the classification of individual members of a group.
How do we decide when an individual is Latino or when another is
AsianAmerican or Caucasian? Some
schools will be closer and some further from this population value. (Note that
the freshmen in a particular unit of the university system may not constitute
a random sample and so the proportion of AsianAmericans will not reflect the
0.2 figure. The number will reflect, however the proportion of AsianAmericans
in the catchment area for the particular unit of the university system e.g.
rural, urban etc.) 1.
Distribution of estimates 2.
Standard error of estimate 3.
Confidence limit on the proportion
As
we noted earlier, when we start to estimate a proportion by taking random
samples of subjects from a population we recognize that our estimates will
vary from sample to sample. The manner in which estimates scatter about
the population value is the distribution of estimates. Many
distributions of estimates, including the distribution of proportions follow
closely to the normal distribution of statistics. The amount by which a
proportions differs from the population is
measured in terms of the standard error of estimate of the proportion.
The
properties of normal distribution are well known. In a normal distribution
about half the estimates will lie within two thirds of a standard error of the
population proportion. About two thirds of the estimates will lie within one
standard deviation of the population proportion. Using
the standard error of estimate as a guideline we can state that in 95% (19 out
of 20) of the cases the population value will fall within (+/)1.96 standard
errors of the estimate. This
region of (+/)1.96 standard errors is called the 95% confidence limit
on the estimated proportion. Calculating
Standard Errors Where
SE = standard error SQRT
= square root p
= proportion n
= sample size
Evidence
or No Evidence 1.
Calculate the standard error from the product of the proportion (0.25)
and its complement (1  0.25). This equals 0.1875 2.
Divide 0.1875by the sample size (400) giving 0.00046875 3.
Take the square root of 0,00046875 which is 0.022
The
estimate and its confidence limits: Also
the region from 0 .208 to 0.292.may seem wider than you feel comfortable with.
If you want narrower confidence limits it would be necessary to employ
a larger sample. Considering the relationship for the standard error you can
see that in order to cut the confidence interval in half you would have to
take four times as large a sample. Comparing
Two Proportions (p) (p1 and p2) Taking
a sample of size (called n1) from the control group in a clinical trial we
observe a proportion of success  called p1. Taking a sample of size (called
n2) for the new treatment group in the same clinical trial we observe a
different proportion of success  called p2.
Since we want to know which treatment is better we are interested in
the difference between the new treatment and the control (p2  p1). The
standard error of the difference between these two proportions is calculated
using the following formula involving the variability of both proportions: SE(p2
 p1) = SQRT{ p2(1  p2)/n2 + p1(1  p1/n1 } Where
SE = standard error, in this case of the difference in proportions SQRT
= square root p
= proportion n1
= control sample n2
= new treatment sample Suppose
now that we observe 20 out of 50 successes in the control SE
= SQRT{ 0.5(0.5)/60 + 0.4(0.6)/50 } = 0.09 (p2
 p1) +/ 1.96 SE = 0.1 +/ 1.96(0.09) = 0.1 +/ 0.18 0.1
+ 0.18 = (+) 0.28 0.1
 0.18 = () 0.08 If
the value zero is located between the two values, within the confidence
limits, (+0.28 and 0.08) then there is no evidence that the difference is
other than chance. The above range does include the value zero so it is
concluded that there is no evidence of a difference between the two treatment
regimens. The
range of uncertainty  the confidence interval  does include zero but also
includes other values up to 0.28, which might well be of considerable
interest. How could we be sure
that if there were a difference of, for example 0.25 we would be sure to find
evidence for a difference? This
is a question of sample size and we will address in the next section. Sample
Size Suppose
now we want to conduct a trial that will distinguish between "no
difference" and a difference of 0.25. We have already indicated how we
can be correct in a decision of no evidence when there is no difference in 19
out of 20 trials by using the confidence limits (+/) 1.96.
Let us now assume that we would like to be correct in our decision of
evidence when there is a difference of 0.25 in 19 out of 20 trials. The
formula for this situation is: n
per group = 13{p2(1  p2) + p1(1  p1)} / (p2  p1)2 Where
n = sample size p1
= proportion of successes in the control group p2
= proportion of successes in the new treatment group We
usually have a reasonable idea of p1 as it represents the proportion of
success expected with the usual procedure and our interest is in an
improvement of 0.25. Let us
suppose p1= 0.35 and hence p2 = 0.6. n
per group = 13{(0.6)(0.4) + (0.35)(0.65)} / (0.6  0.35)2 = 97 subjects per
group It
would thus be sensible to plan for 100 subjects per group.
(The assumptions made about p1 and p2  p1 make rounding off
acceptable.) Confidence
and Risk We
have talked about the use of confidence limits on estimates to assure
ourselves that on 19 of 20 occasions the limits will include the true value.
We have talked about calculating the sample size from the difference
between treatments so that we are confident that on 19 of 20 occasions we will
provide evidence for a difference. Statisticians talk about risk or
confidence. If we are confident
on 19 of 20 occasions then this represents a 95% confidence and if we have 95%
confidence we accept a 5% risk. If
we are assured that 19 out of 20 occasions are correct then we are taking a
risk that 1 out of 20 occasions will be in error. The 1 in 20 occasions in
which the confidence intervals on an estimate will not include the true value
is termed a Type I error and the 1 in 20 occasions when we fail to provide
evidence for a difference is termed a Type II error. Risk choices of other
than 5% can be made but there is little evidence to support the use of
different risks. In my own
practice over the years I have used a 5% risk factor and this is quite
standard practice. In the
literature you may observe that Type II risks are often chosen greater than 5%
in order to keep sample sizes down. This
is rarely an acceptable maneuver. Some
statisticians and contemporary software programs have adopted the practice of
calculating exact risks  for example calculating an exact risk of 7.13 for
their programs. It is my feeling that this is specious and there is no
practical use for this information.
Conclusion 1.
We have shown how to determine the statistical uncertainty in a
calculated proportion from data and how to compare a proportion with a
standard or expected proportion as well as comparing two different proportions 2.
We have pointed out aspects of data collection that may introduce considerable
additional uncertainty 3. We have shown how to determine whether or not to attribute an observed difference to statistical uncertainty or to a genuine effect 4. We have further shown how to determine the magnitude of samples so that we are sure to detect a difference of substantive importance in proportions Copyright © 19992000 Wild Iris Medical Education 