Comparison between Pearson’s and Distance Correlations

Comparison between Pearson’s correlation and distance correlation coefficients. While linear correlation is almost zero for the quadratic and cubic relationships, the distance correlation can capture such functional dependency.

--:--

Instead of using Pearson’s correlation coefficient as underlying associative metric, we employ a relatively new statistical concept, called "distance-correlations".

The distance-correlation coefficient generalizes Pearson’s correlations to the nonlinear and multivariate case by seeking statistical dependency between any two sets of multivariate variables. Two variables may be linearly uncorrelated and still carry some sort of nonlinear dependency, which is not typically detected by Pearson’s correlation coefficient. In other words, linear correlation equal to zero does not imply independence, while distance correlation equal to zero does imply independence.

The figure shows the main difference between linear correlation and distance-correlation in different scenarios. In this example, some random data was generated, and three functional dependencies were simulated: linear, quadratic, and cubic. While Pearson’s correlation coefficient is only able to capture the linear dependency, the distance-correlation coefficient was able to also capture the quadratic and cubic statistical dependencies. Note that the distance correlation only vanishes when the two variables are statistically independent.