Pearson correlation (also known as Pearson’s correlation coefficient) is a statistical measure that describes the linear relationship between two variables. It is denoted by the symbol ‘r’ and ranges between -1 to 1.
A Pearson correlation coefficient of +1 indicates a perfect positive correlation, which means that both variables move in the same direction with a similar magnitude. A Pearson correlation coefficient of -1 indicates a perfect negative correlation, which means that both variables move in opposite directions with a similar magnitude. A Pearson correlation coefficient of 0 indicates no linear correlation between the variables.
The formula to calculate the Pearson correlation coefficient is:
r = (nΣXY – ΣXΣY) / (sqrt[nΣX^2 – (ΣX)^2] * sqrt[nΣY^2 – (ΣY)^2])
Where:
- n is the number of observations
- X and Y are the variables for which you want to calculate the correlation coefficient
- Σ is the summation symbol, which means “add up all the values”
The Pearson correlation coefficient is widely used in fields such as psychology, sociology, economics, and finance to analyze relationships between variables.
The assumptions of Pearson’s correlation, a parametric correlation measure, include:
- Normality: The variables being correlated should be normally distributed.
- Linearity: The relationship between the variables being correlated should be linear.
- Homoscedasticity: The variance of the data should be constant across all levels of the independent variable.
- Independence: The data points being analyzed should be independent of each other.
- Interval or ratio level of measurement: The variables being correlated should be measured on an interval or ratio scale.