Reputation: 3731
I'm trying to calculate the Pearson correlation coefficient of two variables. These variables are to determine if there is a relationship between number of postal codes to a range of distances. So I want to see if the number of postal codes increases/decreases as the distance ranges changes.
I'll have one list which will count the number of postal codes within a distance range and the other list will have the actual ranges.
Is it ok to have a list that contain a range of distances? Or would it be better to have a list like this [50, 100, 500, 1000] where each element would then contain ranges up that amount. So for example the list represents up to 50km, then from 50km to 100km and so on.
Upvotes: 11
Views: 29657
Reputation: 851
In Python 3.10 correlation() function was added to the statistics
module of the Python standard library, it can be directly used by importing the statistics module:
import statistics
statistics.correlation(words, views)
Upvotes: 1
Reputation: 144
try this:
val=Top15[['Energy Supply per Capita','Citable docs per Capita']].rank().corr(method='pearson')
Upvotes: 0
Reputation: 2240
You can also use numpy
:
numpy.corrcoef(x, y)
which would give you a correlation matrix that looks like:
[[1 correlation(x, y)]
[correlation(y, x) 1]]
Upvotes: 7
Reputation: 11012
Use scipy :
scipy.stats.pearsonr(x, y)
Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.
Parameters :
x : 1D array
y : 1D array the same length as x
Returns :
(Pearson’s correlation coefficient, : 2-tailed p-value)
Upvotes: 16