Zach Wilkins

# Confidence Intervals: Can I Be Excused?

No. You can't. All humans should eat their statistical broccoli. My dad used to get very frustrated when I would spite him by eating broccoli with cinnamon. So let me undo the woe by teaching him (and ya'll indirectly) about math's gaseous vegetable.

After several of my mentees asked for more guidance on confidence intervals and sample vs. population statistics, I wrote up some examples that I've decided to share.

Let's say a total troll barrels into the town of North Pole, Alaska. They've completely convinced that everyone is an elf in this town. The troll believes that an elf population would be much shorter than the general human population. They take measurements of 100 people they found. Seemingly of shorter stature, this sample has a mean height of 25.7 inches. Definitely in the realm of gnomes or dwarves, they post the results that North Pole is certainly a harbor of otherworldly humanoids!

North Pole comes out swinging and says, "Hold on, knave! We've ordered all 5,000 humans to be measured in an action of National Security. Our mean height is 67.94 inches. This is a totally normal average for humans." Here is the distribution of the heights. Each bar shows how many people have that specific height. This very closely resembles the density of a normal distribution!

Santa's Statisticians host a news conference. The group's goal is to show how sampling *can *come to the same conclusion of the troll. They've hired some UPS drivers to gather some samples of people as they make deliveries (likely toys and hot cocoa). Over the course of 49 weeks, the drivers get permission to collect 100 heights from their Nordic comrades. Some of the samples they posted look a little far from the mean that the town posted, but they're pretty close overall. In the image below, we get a look at **confidence intervals** for the first time. Each dot represents the sample mean of each week. The red line represents the 67.94 *population *mean that the town posted previously. The bars that extend from each is the 95% confidence interval.

In the image above, **3** **blue **samples out of the 50 (6% of them!) do not include the true mean within the 95% confidence interval. That makes a lot of sense, doesn't it? That's very close to the measure of confidence that we chose. We basically said, "We're okay with not including the true mean 95% of the time within our range." And that's what we got! Let's talk about how we got here.

When a sample is taken, there's no way to be *sure *if the mean is close to the true population mean or not. However, statistics have these nifty little confidence intervals that approximate a range where the true mean would be, starting from a center point of the sample mean at hand. In order to be more confident that the population mean is within a range around the sample mean, that range must be increased. The range itself is defined by several variables, starting with the variance of the given metric and how many observations are in the sample.

The square root of variance divided by the square root of the number of observations makes the **standard error**. It measures the degree of accuracy the sample distribution represents the true, population distribution.

The third variable that affects the range is the level of confidence one would like to use. This value is derived from the standard deviations of the standard normal distribution, pulling a **Z-score** to multiply with the standard error. How this works would take a longer explanation than I'm ready to write out, but usually one just looks up a Z-score associated with a desired level of confidence. Simply put, one needs to know how many standard deviations from the mean will contain xx% of the standard normal distribution's density. A Z-score of 1 (or 1 standard deviaton) on either side of the mean will contain roughly 68% of the distribution's density. A Z-score of 2 contains just over 95% of the density. Even though each standard deviation is equidistant, a smaller percentage of density is gained the further away from the mean one gets.

If one wants to be 75% confident that their interval contains the true mean, those bars will be very *narrow *compared to a range of 99% confidence with the same data. To be more confident, one must consider that there are extreme possibilities that could be quite far away from the given sample mean. The Z-score multiplier associated with 75% is 1.15, 95% is 1.96, and 99% is 2.56.

So it is definitely possible to have extreme results like what the troll found. But, it's very, very unlikely to have a random sample soooo discordant from the other samples. It's much more likely the troll had some sampling bias, like measuring only kids in a daycare or something extra troll-like. However, statisticians have given us the power to protect ourselves from such biases with sampling theory, distributions, and confidence intervals. Knowledge is the power to starve the Christmas troll.