Here are some initial questions you might attempt to answer:
faithful$eruptions
data.faithful$eruptions
data. What trend or patterns do you see?table()
function to find counts of different values of eruption data. Does this work and is the result useful, why or why not?faithful$eruptions
data. What is a good value to use for the breaks parameter? And can you set the axis labels to meaningful values? How would you describe the distribution. Based on this plot, what is your estimate of the mode of the data?Now, using your results from both the calculations in (3) and the histogram in (6), how would you describe the average eruption length to someone? What is the value and why did you choose that?
How many rows and how many columns? You can probably already see this above, but if you needed to calculate it, we could write
dim(faithful)
## [1] 272 2
and similarly, to see the column names, we could write:
names(faithful)
## [1] "eruptions" "waiting"
Here’s what you most likely found:
First the point plot:
plot(faithful$waiting, type="p", ylab="Waiting time (min)")
Then a line plot, with the mean added in red:
plot(faithful$waiting, type="l", ylab="Waiting time (min)")
abline(h=mean(faithful$waiting), col="red")
It’s almost hard to believe those are the same data, right? Just by connecting the dots, a much stronger pattern emerges. There seems to be some regularity to this. I don’t know what it is yet.
What patterns do people see in this data?
plot(table(faithful$waiting), xlab="Waiting Time (min)", ylab="Count")
abline(v=mean(faithful$waiting), col="red")
abline(v=median(faithful$waiting), col="green")
Ok, what does this show? If you’re confused about this, think about (look at?!) what the table()
function outputs and recognize that’s what we’re plotting here. What did table()
do on the WDL data?
table(faithful$waiting)
##
## 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 62 63 64 65 66 67 68 69 70
## 1 3 5 4 3 5 5 6 5 7 9 6 4 3 4 7 6 4 3 4 3 2 1 1 2 4
## 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96
## 5 1 7 6 8 9 12 15 10 8 13 12 14 10 6 6 2 6 3 6 1 1 2 1 1
table(faithful$waiting)[table(faithful$waiting)==max(table(faithful$waiting))]
## 78
## 15
For both the faithful$eruptions
and faithful$waiting
data.
sort()
and or table()
or any other information above to calculate the mode?Why are the mean and median different for this data set? See above plot with mean and median added. Also: are the mean and median always different? What is the mode here? How would we calculate it?
What’s the best estimate of how long someone would have to wait for the next eruption? This really comes down to what do we mean by the “mean” (ha!)….
Does there seem to be a pattern in the data as shown by the point or line plot? What is the pattern? We talked about this above already, and hopefully its obvious that the line plot is superior. It won’t always be. Why is it ‘ok’ to add lines in this case?
Which plot is most useful for understanding the data? I think the answer is it depends.
plot(faithful$waiting, faithful$eruptions, xlab="Waiting time (min)", ylab="Eruption length (min)", ylim=c(0, 6), main="Eruptions at Old Faithful", pch=1, col=2, type="p")