user94628
user94628

Reputation: 3731

Calculating Pearson correlation

I'm trying to calculate the Pearson correlation coefficient of two variables. These variables are to determine if there is a relationship between number of postal codes to a range of distances. So I want to see if the number of postal codes increases/decreases as the distance ranges changes.

I'll have one list which will count the number of postal codes within a distance range and the other list will have the actual ranges.

Is it ok to have a list that contain a range of distances? Or would it be better to have a list like this [50, 100, 500, 1000] where each element would then contain ranges up that amount. So for example the list represents up to 50km, then from 50km to 100km and so on.

Upvotes: 11

Views: 29657

Answers (4)

Cem Önel
Cem Önel

Reputation: 851

In Python 3.10 correlation() function was added to the statistics module of the Python standard library, it can be directly used by importing the statistics module:

import statistics

statistics.correlation(words, views)

Upvotes: 1

Shaurya
Shaurya

Reputation: 144

try this:

 val=Top15[['Energy Supply per Capita','Citable docs per Capita']].rank().corr(method='pearson')

Upvotes: 0

Antimony
Antimony

Reputation: 2240

You can also use numpy:

numpy.corrcoef(x, y)

which would give you a correlation matrix that looks like:

[[1          correlation(x, y)]
[correlation(y, x)          1]]

Upvotes: 7

lucasg
lucasg

Reputation: 11012

Use scipy :

scipy.stats.pearsonr(x, y)

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.

Parameters :

x : 1D array

y : 1D array the same length as x

Returns :

(Pearson’s correlation coefficient, : 2-tailed p-value)

Upvotes: 16

Related Questions