Chapter 14 Two Sample Hypothesis Testing

14.1 Overview

In chapters 11 and 12 we learned about methods for statistical inference and hypothesis testing when evaluating one sample of data, whether it was proportions or continuous data, and in this latter case whether σ was known or not. In these situations, we were comparing our sample results to some prior estimate of μ or p or using our sample to estimate a likely range of where the true value of μ or p lied.

In this chapter we will discuss methods of statistical inference and hypothesis testing for comparing two separate samples of data. For example, maybe these samples are taken from different groups and we want to evaluate if their means are the same.

Specifically, we will discuss analysis of:

  • The differences between proportions, e.g. H0:p2p1=0
  • The difference of two means, e.g. H0:μ1=μ2
  • Paired data, for repeated measures on the same experimental unit, e.g. H0:μdiff=0

In each of these situations, we will see that our general framework for hypothesis testing and inference remains the same, and similar to how we approached the t-distribution, we will slightly vary our null distribution, standard error calculation, test statistic depending on the specific test.

14.2 Analysis of the Differences Between Proportions

We’ve previously looked at how to evaluate if an observed proportion reasonably matches our prior estimate. For example, we could ask “does an observed sample reasonably come from a population with p=0.5?”, or “Is there a statistically significant difference between our sample and p=0.5?”

Now, we’re going to learn how to evaluate whether two samples have the same value of p. As an obvious example, think of a medical trial. We give an experimental drug to one group and a placebo (sugar pill) to another group, and then we want to evaluate if the responses are statistically the same (or different) between these two groups. For this section, we’ll assume the response will be binary, i.e. either the treatment worked or not.

Our general approach for hypothesis testing will be the same as we previously did:

  • Develop a null and alternative hypothesis and determine our “α-level”
  • Figure out our null distribution and critical values based on α
  • Collect the data and calculate our “test-statistic”
  • Calculate how likely our observed results are under the null distribution, including the p-value
  • Reject or fail to reject H0 (if for example the p-value is less than α=0.05)

Importantly our null distribution, standard error (SE), and test-statistic will all be slightly different than what we found when evaluating a single sample.

14.2.1 Learning Objectives

After this section, you should be able to:

  • Run a hypothesis test for the difference of proportions
  • Describe the calculations of the null distribution, standard error (SE), and test-statistic for hypothesis testing of difference of proportions
  • Calculate a confidence interval on the true difference between two observed proportions

14.2.2 Comparing Proportions in Two Different Groups

Let’s start with an example.

A survey of 827 randomly sampled registered voters in California about their views on drilling for oil and natural gas off the Coast of California asked, “Do you support? Or do you oppose? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent graduated from college.

Results College Not
Support 154 132
Oppose 180 126
Do Not Know 104 131
Total 438 389

Based on this data, we might ask: “Is there strong evidence to suggest that the proportion of non-college graduates who support off-shore drilling in CA is different than the proportion of college graduates who do?”

For this question, what we’re really asking is if the proportion of college graduates who support off-shore drilling (we’ll call this p1) is equal to the proportion of non college graduates who support off-shore drilling (we’ll call this p2).

Based on the above definitions, our null hypothesis is H0:p1p2=0 (i.e. no difference between the two) and our alternate hypothesis is HA:p1p20.

Of course, we might see an observed difference that isn’t exactly zero, but what we are asking is if it makes sense that any observed difference occurs simply as the result of random sampling.

One slight difference with this test (compared with our one sample version) is that we can’t fully setup our null distribution before collecting the data. We need to know the size of each group, n1 and n2, as well as the overall proportion.

So, based on our data, what is the total proportion of individuals who support off shore drilling, regardless of education level? From above we find this as p=total # who supporttotal # surveyed=154+132438+389=0.3458.

Note that by saying p1=p2, we’re basically saying they both equal p, the overall proportion, and that the group someone is in (college graduate or not) does not affect their views about offshore drilling.

14.2.3 Determining the Null Distribution

Our next step is to determine our null distribution. Based on the CLT we’ll write ˆp1ˆp2N(0,SE), basically saying that the expected difference is centered at 0 with a normal distribution. Actually, while this is technically correct, we will actually modify it slightly. More details on that below AND first we need to determine the value of the standard error, SE, which requires a different calculation then before.

To find the standard error note that we now have two proportions to deal with so we will need to combine them. We’ll use the following equation based on the pooled proportion p (as determined above) and the size of each group: SE=p(1p)(1n1+1n2)

Using our specific values for p, n1 and n2 we have:

se <- sqrt(0.3458*(1-0.3458)*(1/438+1/389))
se
## [1] 0.03313665

(I won’t derive why this works here, but for those of you interested there’s more information further below in the notes.)

Now we’re in a position to fully define our null distribution. As we discussed at the end of chapter 12, we could either use ˆp1ˆp2N(0,SE) or in fact a better approach is to “z-score” this and instead write: (ˆp1ˆp2)0SEN(0,1)

This latter version makes finding critical values and p-values a little easier, at the expense of a slightly more complex test statistic.

Using this latter form, we can easily find our critical values (at α=0.05) as

cat("lower:", round(qnorm(0.025, 0, 1), 3), "\n")
cat("upper:", round(qnorm(0.975, 0, 1), 3), "\n")
## lower: -1.96 
## upper: 1.96

but we probably knew that already…

14.2.4 Calculating the Test Statistic

Now we are at the point were we can now calculate our test statistic. First, let’s find p1 and p2 as p1=154438=0.3516 and p2=132389=0.3393. Obviously these are NOT the same, but is the difference significant?

From here we can calculate our test statistic, using the general form of (p1p2)0SE (where we subtract 0 as a placeholder because that is the value of the difference in the null hypothesis), or

(0.3516-0.3393)/0.03314
## [1] 0.3711527

As mentioned above, under the null hypothesis of no difference between groups, this test statistics will have a standard Normal distribution), i.e. ˆp1ˆp2SEN(0,1).

So our question of interest is, how likely are we to see a value as extreme as 0.3711 given our null distribution?

Based on our critical values, we already know we will fail to reject H0 since this value is within the critical values. Similarly, since this is a two-sided test, we can calculate our p-value as:

2* pnorm(0.3711, 0, 1, F)
## [1] 0.7105631

Since this is much larger than α=0.05 we fail to reject our H0 and conclude there is no statistically significant difference between the college educated or not in their support for off-shore drilling.

14.2.5 Using prop.test() in R

There is a built in function in R that approximates our results. We can use the prop.test() function as:

prop.test(c(154, 132), c(438, 389), correct=F)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  c(154, 132) out of c(438, 389)
## X-squared = 0.13703, df = 1, p-value = 0.7113
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.05264371  0.07717682
## sample estimates:
##    prop 1    prop 2 
## 0.3515982 0.3393316

Be careful about how you pass the data and note the first parameter c(154,132) is the vector of the number of successes for each group, and the second parameter c(438, 389) is a vector of the sample size of each group. As shown, use c() to create these vectors based off your data. Also, it is important to turn off the continuity correction using correct=F.

The prop.test() function gives an approximate result, and might yield results with slightly more precision, particularly for small sample sizes. In the above analysis, we used the CLT to create our null distribution, however assumptions of normality may not entirely be valid. So be aware of the potential for slight differences in both test statistic and p-values.

Part of the output of this function call is X-squared = 0.13703. This is (approximately) the value of our previously calculated test statistic (0.3712) “squared”:

0.3712^2
## [1] 0.1377894

For the reasons noted above these are approximately equal, and should except in rare cases, lead to the same conclusion.

14.2.6 Optional: The null hypothesis, point estimate and test statistic

Imagine have two different groups, each with their own number of trials and successes. Let Y1 be the number of successes in the first group and n1 be the total number of participants. Then p1=Y1n1. Similarly, p2=Y2n2.

Our null hypothesis (H0) here is there is no statistically significant difference between the proportions of the two groups. In particular, we can quantify this as p1p2=0. Our alternative hypothesis is that there is some difference, i.e. p1p20.

To determine our test statistic, we start with ˆp1ˆp2 (using the “hats” because its observed data), which we will then scale by the standard error. The value ˆp1ˆp2 is our point estimate. This represents our best estimate of the difference between the two groups.

Our general approach will be: Z=(point estimatenull value)standard error

So, in this case, under the null hypothesis of no difference, our test statistic is then going to be (ˆp1ˆp2)0SE. Importantly, this has a distribution of N(0,1).

Note: This is equivalent to the transformation we used our when we said that if XN(μ,σ), then XμσN(0,1).

14.2.7 Optional: Calculating the Standard Error

For comparing the proportions of two groups, we want to know about the standard error of p1p2. In chapter 11 we found the standard error of one group, p, as SE=p(1p)n. We’ll see a similar, although slightly more complicated result here.

The standard error of p1p2 is the square root of the variance of p1p2.

As previously discussed, “variances add”, which means we can write:

Var[p1p2]=Var[p1]+Var[p2]

Now, we know Var[p1]=p1(1p1)n1 (see chapter 11) and so we can rewrite the above equation as:

Var[p1p2]=p1(1p1)n1+p2(1p2)n2

and so for comparing the proportions of two groups, our standard error is:

SE=p1(1p1)n1+p2(1p2)n2 This is the general case and is true regardless of H0.

Now, when we assume that p1 and p2 are the same (which is a typical H0), then we are really assuming that they both equal the overall p for the combined or ‘pooled’ groups. This calculation, repeated from above, is:

p=Y1+Y2n1+n2=p1n1+p2n2n1+n2

In this special case we can substitute p1=p and p2=p and then simplify our standard error as:

SE=p(1p)n1+p(1p)n2=p(1p)(1n1+1n2)

which is the result presented above. Note this approach is similar to our one sample standard error where we basically “averages” (in a slightly odd way) the two group sizes.

14.2.8 Inference on the True difference between p1 and p2

As before we can also compute a confidence interval on the true value of p1p2. Here though we’ll use our more general calculation of SE (from the previous section), since we don’t know p:

SE=ˆp1(1ˆp1)n1+ˆp2(1ˆp2)n2 Plugging in the values for the off-shore drilling question we find:

se <- sqrt(0.3516*(1-0.3516)/438 + 0.3393*(1-0.3393)/389)
se
## [1] 0.03311772

which is very similar to our previous estimate (i.e. when we just used the overall proportion).

We can then use the ˆp1ˆp2±1.96SE approach to find the limits of the 95% confidence interval on the true value of p1p2 as:

(0.3516-0.3393)-1.96*0.03312
(0.3516-0.3393)+1.96*0.03312
## [1] -0.0526152
## [1] 0.0772152

Note that here the center of our interval is the observed difference in proportions (i.e. the point estimate).

What do you notice about the interval? What value does it contain?

14.2.9 Guided Practice

In a random sample of 1500 First Nations children in Canada, 162 were in child welfare care. In a different random sample of 1600 non-Aboriginal children, 23 were in child welfare care. Many people believe that the large proportion of indigenous children in government care is a humanitarian crisis. Do these data give significant evidence that a greater proportion of First Nations children in Canada are in child welfare care than the proportion of non-Aboriginal children in child welfare care?

(from Barron’s AP Stats p272)

Do this (i) manually and (ii) using the prop.test() function in R. Confirm that you get the same results.

14.2.10 Summary of the Hypothesis Testing Statistics

value Difference in Proportions (for test of no difference)
null hypothesis H0:p1p2=0
null distribution N(0,1)
point estimate ˆp1ˆp2
SE SE=p(1p)(1n1+1n2), where p=Y1+Y2n1+n2=ˆp1n1+ˆp2n2n1+n2
test statistic (ˆp1ˆp2)0SE

Note that I’ve introduced the term point estimate here, to distinguish the observed difference between our proportions from the test statistic, particularly since the test statistic has its scaled form. The point estimate is our best estimate of the true difference between the groups.

14.2.11 Review of Learning Objectives

After this section, you should be able to:

  • Run a hypothesis test for the difference of proportions
  • Describe the calculations of the standard error, test statistic and null distribution for hypothesis testing of difference of proportions
  • Calculate a confidence interval on the true difference between two observed proportions

14.3 Difference of Means & Paired Data

In this section, we’ll continue to look at situations where we compare two-samples, now examining those cases where we are analyzing continuous data. As was the case in chapter 12, because of concerns about small samples and not knowing σ, we will typically use the t-distribution here as our null distribution.

Specifically, here we will look at two types of data and hypothesis tests:

  • difference of means, where we have two data sets from two different populations that are not necessarily related or connected, and where we want to test if the means are the same or different. Note that the two sample sizes do NOT need to be the same.
  • paired data, where we have two data sets and where each observation in one set has a corresponding observation in the second data set. Think about patients who are measured twice (before and after?) or locations where the temperature (or other environmental variable) is measured on different dates. Here, we will often want to test if the means of the before and after group are the same or different.

14.3.1 Learning Objectives

After this section, you should be able to:

  • Run a hypothesis test for both the difference of means and paired data
  • Describe the calculations of the standard error, test statistic and null distribution for hypothesis testing of difference of means and paired data
  • Explain the use of the point estimate
  • Calculate a confidence interval on the true difference between means or paired data

14.3.2 Summary of the Hypothesis Testing Statistics

Instead of describing the tests in detail, here we will start with a summary table of the statistics used in our hypothesis testing approach:

value Difference of Means Paired Data
null hypothesis H0:μ1μ2=0 H0:μdiff=0
null distribution tdf tdf
degrees of freedom, df min(n11,n21) ndiff1
critical values (at α=0.05) qt(0.025, df), qt(0.975, df) qt(0.025, df), qt(0.975, df)
point estimate ˉX1ˉX2 ˉXdiff=(X1iX2i)/n
Standard Error, SE s21n1+s22n2 sdiffndiff
test statistic (ˉX1ˉX2)SE ˉXdiffSEdiff

where again I’m using the term point estimate. Importantly, we compare the point estimate to the null hypothesis whereas we compare the test statistic to the null distribution. This is a useful distinction when we’re scaling the point estimate (and converting it into the test statistic - the “t-score”) before comparing it to our null distribution.

14.3.3 Notes on Difference of Means Testing

When testing difference of means, we’ll take samples from each group, and then have two separate observed means and sample sizes: ˉX1 and n1 for the first group and ˉX2 and n2 for the second group. Each group also has an observed standard deviation, s1 and s2 respectively. Since we’re using the t-distribution we need to calculate a degrees of freedom, and for that we’ll use one less than the smaller group size. Finally note that the form of the standard error is slightly different than before, particularly in that we are taking the square root of the whole expression and the standard deviations are squared (aka variances).

There is some disagreement about the proper calculation of degrees of freedom for difference of means testing. Some texts suggest using n1+n22 and some suggest a more complicated formula that we will not discuss here. Using min(n11,n21) as suggested above is a conservative approach. Suffice it to say that p-values shouldn’t vary too much based on the different approach to choosing the degrees of freedom, AND when you should remember that if your p-value is close to α, caution in your conclusion is warranted.

14.3.4 Notes on Paired Data Testing

The purpose of paired data testing is to take two (repeated) samples on a given experimental unit and to evaluate if the average difference is reasonably equal to zero (or some other prior value).

Here we have two samples of equal sizes n and all data points will be “paired”, again think two measurements from the same experimental unit. The typical situation is that the two samples result because each unit was measured twice.

To calculate the point estimate, we first create a vector of length n=ndiff which is the observed difference between the two measurements: Xdiff=X1X2. We then take the average of this Xdiff vector, and this average becomes our best estimate of the true difference between the samples. We also calculate the standard deviation of this vector to use in our standard error calculation.

In essence, under paired data testing, once we’ve created the “difference” vector, we then proceed as we would under a one sample t-test approach.

14.3.5 Inference for Difference of Means and Paired Data

To create confidence intervals, in both cases, we’ll use our point estimate (i.e. observed results) as the center of the interval, and we’ll add/subtract an appropriate number of standard errors, depending on our confidence level.

For example, a 95% confidence interval on the true difference of means using a t-distribution with 20 degrees of freedom would be:

ˉx1ˉx2±2.086SE

since

qt(0.025, 20)
## [1] -2.085963

Similarly, a 99% confidence interval on the true difference of paired data using a t distribution with 15 degrees of freedom would be

ˉxdiff±2.131SE

since

qt(0.025, 15)
## [1] -2.13145

14.3.6 Guided Practice

We have now done enough hypothesis testing where the overall framework is the same, that you should be able to proceed without a worked example. We’ll instead jump straight to guided practice, using the summary table above.

Difference in Means:

  1. The following summary statistics represent a random sample of 150 cases of mothers and the birth weights of their newborn infants in North Carolina over a year. (p269 os4.pdf) The data represent the birth weights of newborns. Our question: Is there a statistically significant difference between birth weights of infants in mothers who smoke compared to those who don’t?
statistic smoker nonsmoker
sample mean, ˉx 6.78 7.18
sample standard deviation, s 1.43 1.60
sample size, n 50 100
  1. Use a difference of means test to calculate this at the α=0.05 level.
  2. What is a 95% confidence interval around the true difference in means?

.
.

  1. You are interested in evaluating the tensile strength of two different filaments (ABS vs PLA) used in 3D printing. Tensile strength is the maximum stress a material can withstand before breaking when being pulled apart, measured in Newtons/m2 or Pascals.

You run an experiment and collect the following data. What are the average tensile strengths (N/m2) of the two materials? Is there a statistically significant difference between the two materials in terms of their average tensile strength?

abs <- c(30.6, 28.1, 28.9, 23.6, 29.5, 31.7, 28.2, 28.7, 32.4, 26.6, 35.7)
pla <- c(34.1, 37.0, 32.4, 31.4, 29.8, 36.2, 34.4, 32.4, 30.5, 34.4, 28.0, 36.3)

Paired Data:

  1. Let’s consider a limited set of climate data, examining temperature differences in 1948 vs 2018. We sampled 197 locations from the National Oceanic and Atmospheric Administration’s (NOAA) historical data, where the data was available for both years of interest. We want to know: Are there statistically differences in the number of days with temperatures exceeding 90°F between 2018 and 1948?

First we calculated the number of days exceeding 90°F at each location in both 1948 and 2018. The 1948 data represents one sample and the 2018 data represents a second sample, both of length ndiff=197.

Next, we determined the difference in number of days exceeding 90°F (number of days in 2018 - number of days in 1948) for each of the 197 locations. The average of these differences was ˉXdiff=2.9 days with a standard deviation of sdiff=17.2 days.

  1. Perform a paired data hypothesis test to see if this difference is statistically significant at α=0.05. Be sure to write out your null hypothesis and follow all of our steps. What do you conclude?
  2. Calculate a 90% confidence interval on the true value of the difference in number of days. How does your confidence interval relate to your results from part (a)?

.
.

  1. Imagine a certain medical test where 18 patients are measured before and after treatment:
before <- c(19.6, 22.3, 21.6, 18.2, 21.2, 20.0, 20.5, 21.4, 20.3, 19.9, 18.7, 19.3, 20.8, 19.4, 20.8, 22.6, 20.9, 22.1)
after <- c(19.6, 20.0, 22.0, 20.5, 22.4, 23.1, 20.3, 23.1, 24.5, 21.6, 21.4, 23.6, 22.8, 20.4, 21.2, 22.8, 20.0, 21.9)

(NOTE: If you’re given both data sets, you should first create a new vector that is the difference between our groups. Then, proceed similar to how you would with a one sample test.)

  1. Create a vector of length 18 that contains the changes per patient that occurred after treatment.
  2. Plot a histogram of this new vector. What do you notice? What does this histogram suggest about whether a difference exists in the sample as a result of treatment?
  3. Calculate the mean and standard deviation of your vector.
  4. Run a hypothesis test at α=0.10 to evaluate if there is a statistically significant difference in patient results before and after treatment.

14.3.7 Review of Learning Objectives

After this section, you should be able to:

  • Run a hypothesis test for both the difference of means and paired data
  • Describe the calculations of the standard error, test statistic and null distribution for hypothesis testing of difference of means and paired data
  • Explain the use of the point estimate
  • Calculate a confidence interval on the true difference between means or paired data

14.4 Review of One-sided and Two-sided tests

We discussed one-sided and two-sided tests in our chapter on one sample hypothesis testing. The same ideas generally apply to two sample testing although it can be a bit more complicated.

The particular issue to be aware of is that how we order our samples matters and it depends on whether we’re looking for results that are bigger or smaller than 0. This order needs to be correct and internally consistent with respect to the following three (3) steps:

  • the null and alternative hypothesis
  • the point estimate
  • the side where the critical value lies

14.4.1 Some Guidelines

Suppose we really want to know if the mean/proportion of group A is bigger than that of group B. We will conclude that it is ONLY IF there is ample evidence that A is much bigger than B, where the difference didn’t occur simply by randomness.

This guides us that we will reject H0 only if our test statistic is too big.

Hence our null hypothesis will be that A isn’t bigger than B (i.e. A is either smaller or the same as B), which we’ll write as H0:AB, and in line with this our alternative hypothesis will then be that A is actually bigger than B, so HA:A>B.

Based on how we’ll reject, (only if the difference is too big) we only want an upper critical value. And finally our test statistic needs to be ABSE, because if this is big (positive) it means that A is bigger than B.

The same approach works for testing if the mean/proportion of group C is less than that of group D, except the other way.

(Note that if we wrote this the other way, we’d be assuming it is true to start, and only rejecting H0 if the difference, the other way, was too large. This doesn’t work because it doesn’t all for A and B to be the same.)

14.4.2 General Examples

1. How do we write the null and alternative hypothesis for these cases?

  • We have two different observed proportions ˆpa and ˆpb and we want to know if the pa is greater.

H0:papb, HA:pa>pb

  • We have samples from two different groups, ˉxa and ˉxb and we want to know if group b has a lower mean.

H0:μbμa, HA:μb<μa

  • We have before and after samples from a set of patients measuring health outcomes and we want to know if patients were improved (assume a lower test value is better) after the intervention.

Let μd be the true mean of the difference, calculated as “after - before” (i.e. first subtract those vectors then take the mean). Then, H0:μd0, HA:μd<0

In summary, the alternative hypothesis typically describes what we want to test or what we’re hoping to find.

2. When we’re calculating a critical value, particularly when our Null distribution is N(0,1) or a t distribution, how does the sign of the critical value relate to the inequality in the null hypothesis?

We only need care about the sign of the critical value because both the t-distribution and the N(0,1) distribution are centered around 0.

In fact it’s the inequality in the alternative hypothesis that points to where the critical value is located. If the alternative hypothesis is HA:μb<μa, then our point estimate will be μbμa and our critical value will be negative. If the inequality points the other way, the critical value will be positive.

3. Specifically how do we calculate the one sided critical value when we have a N(0,1) or a t-distribution with df degrees of freedom

Assuming α=0.05, for a lower critical value we would use either the qnorm(0.05) or qt(0.05, df) functions. Similarly, for an upper critical value, we would use the qnorm(0.95) or qt(0.95, df).

14.4.3 Guided Practice

Answer the following questions in each of the given scenarios:

A. The federal government deems it “safe” if the average level of lead in water is less than 15 ppb (parts per billion). You collect data from a local school. What are appropriate null and alternative hypothesis if you want to evaluate if the sample you just collected is above the safe limit? What type of hypothesis test would you use? At α=0.05, what would your critical value be?

B. Forest thinning (the process of removing small trees and shrubs) is generally accepted as a method for reducing wildfire, but what are the other impacts? You measure 12 plots of land before and after thinning treatments and count the number of species present. What are appropriate null and alternative hypothesis to evaluate if forest thinning reduces biodiversity (species counts)? What type of hypothesis test would you use? At α=0.05, what would your critical value be?

C. The EPA annually releases fuel economy data on cars manufactured in that year. What are appropriate null and alternative hypothesis to test if automatic transmissions have higher estimate fuel economy ratings than manual transmissions? What type of hypothesis test would you use? Assuming you select n1=25 cars with automatic transmissions and n2=21 cars with manual transmissions, at α=0.10 what would your critical value be?

D. Are allele frequencies different in different sub-populations? You take a sample of two different populations and measure the absence or presence of a specific allele. What are appropriate null and alternative hypothesis to evaluate if the proportion that the given allele occurs in population A is less than the rate it occurs in population B? What type of hypothesis test would you use? At α=0.01, what would your critical value be?

14.5 Exercises

14.5.1 Difference in Proportions

Exercise 14.1 An experiment is run to see if studying a year of Latin can increase verbal SAT scores. A student’s first attempt on SAT verbal scores were compared with PSAT verbal scores and it was noted whether the scores increased by at least 100 points or not. The following table shows the results.

  1. What is your point estimate of the difference in the two groups? Explain it in context. Based on this, what is your guess about whether studying Latin helps improve test scores?
  2. Run a hypothesis test at α=0.05 to evaluate if there is a statistically significant difference in SAT results for studying Latin or not. Be clear about all of the steps.
  3. Comment about why these results may not be generally applicable based on the experimental design.
increase in score <=100 >100 total
studied Latin 6 14 20
didn’t 11 8 19

 

Exercise 14.2 Repeat the analysis from the previous problem using the prop.test() function in R. How do your results compare?

 

Exercise 14.3 Using the data on studying Lain from above, create a 95% confidence interval on the true difference between the proportions. Note, when using the standard error for confidence intervals on the true difference in proportions you should use the following formula, with z=1.96 for a 95% confidence interval.

ˆp1ˆp2±z׈p1(1ˆp1)n1+ˆp2(1ˆp2)n2

What do the results tell you? How are your results consistent or inconsistent with the conclusion from your hypothesis above?

 

Exercise 14.4 Consider an experiment for patients who underwent cardiopulmonary resuscitation (CPR) for a heart attack and were subsequently admitted to a hospital. These patients were randomly divided into a treatment group where they received a blood thinner or the control group where they did not receive a blood thinner. The response variable of interest was whether the patients survived for at least 24 hours. Results are shown in the table below.

Perform a hypothesis test (by hand) to evaluate if there is a statistically significant difference at α=0.05 between survival in the control and treatment groups.

n Survived Died Total
Control 11 39 50
Treatment 14 26 40
Total 25 65 90

 

Exercise 14.5 Using the CPR data from above, create a 95% confidence interval on the true difference between the means.

What do the results tell you? How are your results consistent or inconsistent with your answers above?

 

Exercise 14.6 A supplier of parts for Boeing has two different factories that produce the same part. They are interested in understanding if the factories yield similar quality, (i.e. in terms of the percent of the total that pass). The following table gives results from a recent production run at each:

  1. Run a hypothesis test at α=0.05 to evaluate if the yield rate (% that pass) between the two factories is the same. What do you conclude?
  2. What is a 90% confidence interval on the true difference between the yields from the two factories?
quality factory A factory B
passed 850 920
failed 50 80
total 900 1000

 

Exercise 14.7 A drone company is considering a new manufacturer for rotor blades. The new manufacturer would be more expensive, but they claim their higher-quality blades are more reliable, with 3% more blades passing inspection than their competitor. The quality control engineer collects a sample of blades, examining 1000 blades from each company, and she finds that 899 blades pass inspection from the current supplier and 958 pass inspection from the prospective supplier. Use a hypothesis test to evaluate the new manufacturer’s claims. Perform a hypothesis test to evaluate if the difference between the reliability of current blades and new blades is statistically different than 3% at the α=0.05 level.

 

14.5.2 Difference in Means

Exercise 14.8 A city council member claims that male and female police officers wait equal times for promotion in the police department. A women’s spokesperson, however, argues that women wait longer than men. A random sample of men show they waited 8, 6.5, 9, 5.5 and 7 years for promotion, while a random sample of women waited 9.5, 5, 11.5, 8 and 10 years for promotion.

  1. What conclusion should be drawn? Be explicit about your null and alternative hypothesis, point estimate, standard error, test statistic and p-value.
  2. What steps should be done to strengthen your conclusion?

(modified from Barron’s AP Stats p297)

 

Exercise 14.9 PM2.5 measures the concentration of fine particulate matter in the air. This measurement accounts for very small particles which can bypass the natural defenses in the human respiratory systems and embed themselves in the lungs or migrate into the bloodstream. Repeated exposure can impact lung and cardiovascular function.

For reference, the EPA considers values between 0 and 10 μg/m3 to be under the WHO target, 10.1 to 12 to be “good”, 12.1-35.4 to be “moderate”, 35.5 to 55.4 to be “unhealthy for sensitive groups” and above 55.5 to be “unhealthy” or worse.

In 2019, California had 9 of the top 10 most polluted cities in the US, and one of the worst cities was Walnut Park (part of Los Angeles). The city of Sunnyside, WA (near Yakima) ranked as the city with the highest pollution in the WA state (in terms of PM2.5) in 2019. The following data shows the monthly average in 2019 for both Walnut Park, CA and Sunnyside, WA.

  1. Is there a statistically significant difference (at α=0.05) in average air quality in these two cities?
  2. What is a 90% confidence interval on the true average difference in PM2.5 between these cities?

(All data taken from iqair.com)

Walnut_Park <- c(24.5, 12.6, 11, 13, 11.1, 19.1, 20.6, 16.8, 11.5, 16.2, 24.9, 19.9)
Sunnyside <- c(10.5, 14.6, 13.8, 5.3, 7.6, 7.5, 10.3, 10.2, 6.4, 9.2, 22.1, 14.3)

 

Exercise 14.10 Write a custom R script or function that allows you to run a hypothesis test on the difference of means.

 

14.5.3 Paired Data

Exercise 14.11 Do Amazon books sell for less than those at your local bookstore? The following table shows the average difference in prices and standard deviation of the difference in prices for n=28 books for sale on both Amazon and at Island Books on Mercer Island. xdiff represents the Island Books price minus the Amazon price.

Based on this data, run a hypothesis test at α=0.05 to evaluate if Amazon generally sells books for less than your local bookstore. What do you conclude?

 

ndiff ˉxdiff sdiff
28 2.18 11.45

 

Exercise 14.12 Forest thinning (removing small trees and shrubs) is generally accepted as a method for reducing wildfire, but what are the other impacts? You are interested in understanding how thinning affects biodiversity. You measure 12 1x1 meter plots of land, right before and 1 year after thinning treatments and count the number of plant species present at each time. The following data represents the counts of species before and after treatment.

Use a paired data test to run a hypothesis test at α=0.10 to evaluate if there is a statistically significant difference in the number of species before and after forest thinning treatments. What do you conclude and why?

 

before <- c(5, 12, 8, 5, 7, 5, 7, 4, 6, 5, 8, 11)
after <- c(6, 7, 6, 3, 6, 4, 3, 8, 5, 4, 4, 9)

 

Exercise 14.13 Write a custom R script or function that allows you to run a hypothesis test on paired data (a) given summary data and (b) using the raw data.