The Central Limit Theorem (CLT) is a key theory in statistics and probability. It states that if you have a population with a mean μ (mu) and standard deviation σ (sigma), and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will be true no matter the shape of the population distribution.
The theorem is central to many applications of statistics, because it underpins the idea that we can make inferences about a population based on samples.
To break it down:
1. Random samples: You’re randomly choosing values from the population, such that every possible sample has an equal chance of being selected.
2. Sufficiently large: The sample size should be large enough for the theorem to hold true. Typically, a sample size of 30 or more is considered sufficient.
3. With replacement: After taking a sample from your population, you put it back before drawing the next sample. This means every draw is independent of the others.
4. Approximately normally distributed: The shape of the distribution of sample means will resemble a “bell curve,” or normal distribution.
5. Population Mean (μ): This is the average of all the values in the population.
6. Population Standard Deviation (σ): This measures the dispersion of the population values.
To add, the central limit theorem also states that the mean of these sample means will be equal to the population mean, and the standard deviation of these sample means (known as the standard error) will be equal to the population standard deviation divided by the square root of the sample size.
The Central Limit Theorem is fundamental to hypothesis testing and confidence intervals in statistics because it allows us to make probabilistic statements about the sample means and infer back to the population mean. It’s why many statistical techniques rely on normal distributions.