Reputation: 5993
I have sets of data with two equally long arrays of data, or I can make an array of two-item entries, and I would like to calculate the correlation and statistical significance represented by the data (which may be tightly correlated, or may have no statistically significant correlation).
I am programming in Python and have scipy and numpy installed. I looked and found Calculating Pearson correlation and significance in Python, but that seems to want the data to be manipulated so it falls into a specified range.
What is the proper way to, I assume, ask scipy or numpy to give me the correlation and statistical significance of two arrays?
Upvotes: 9
Views: 14629
Reputation: 3253
If you want to calculate the Pearson Correlation Coefficient, then scipy.stats.pearsonr
is the way to go; although, the significance is only meaningful for larger data sets. This function does not require the data to be manipulated to fall into a specified range. The value for the correlation falls in the interval [-1,1]
, perhaps that was the confusion?
If the significance is not terribly important, you can use numpy.corrcoef()
.
The Mahalanobis distance does take into account the correlation between two arrays, but it provides a distance measure, not a correlation. (Mathematically, the Mahalanobis distance is not a true distance function; nevertheless, it can be used as such in certain contexts to great advantage.)
Upvotes: 7
Reputation: 18477
scipy.spatial.distance.euclidean()
This gives euclidean distance between 2 points, 2 np arrays, 2 lists, etc
import scipy.spatial.distance as spsd
spsd.euclidean(nparray1, nparray2)
You can find more info here http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
Upvotes: 0
Reputation: 5629
You can use the Mahalanobis distance between these two arrays, which takes into account the correlation between them.
The function is in the scipy package: scipy.spatial.distance.mahalanobis
There's a nice example here
Upvotes: 2