How to get p-value and pearson's r for a list of columns in Pandas?

Question

I'm trying to make a multiindexed table (a matrix) of correlation coefficients and p-values. I'd prefer to use the scipy.stats tests.

x = pd.DataFrame(
    list(
        zip(
            [1,2,3,4,5,6], [5, 7, 8, 4, 2, 8], [13, 16, 12, 11, 9, 10]
            )
            ),
            columns= ['a', 'b', 'c'] 
            )
 

# I've tried something like this
for i in range(len(x.columns)):
    r,p = pearsonr(x[x.columns[i]], x[x.columns[i+1]])
    print(f'{r}, {p}')

Obviously the for loop won't work. What I want to end up with is:

		a	b	c
a	r	1.0	-.09	-.8
	p	.00	.87	.06
b	r	-.09	1	.42
	p	.87	.00	.41
c	r	-.8	.42	1
	p	.06	.41	00

I had written code to solve this problem (with help from this community) years ago, but it only worked for an older version of spearmanr.

Any help would be very much appreciated.

Laurent · Accepted Answer

Here is one way to do it using scipy pearsonr and Pandas corr methods:

import pandas as pd
from scipy.stats import pearsonr

def pearsonr_pval(x, y):
    return pearsonr(x, y)[1]


df = (
    pd.concat(
        [
            x.corr(method="pearson").reset_index().assign(value="r"),
            x.corr(method=pearsonr_pval).reset_index().assign(value="p"),
        ]
    )
    .groupby(["index", "value"])
    .agg(lambda x: list(x)[0])
).sort_index(ascending=[True, False])

df.index.names = ["", ""]

Then:

print(df)
# Output
            a         b         c

a r  1.000000 -0.088273 -0.796421
  p  1.000000  0.867934  0.057948
b r -0.088273  1.000000  0.421184
  p  0.867934  1.000000  0.405583
c r -0.796421  0.421184  1.000000
  p  0.057948  0.405583  1.000000

How to get p-value and pearson's r for a list of columns in Pandas?

Answers (1)

Related Questions

How to get p-value and pearson&#39;s r for a list of columns in Pandas?

Answers (1)

Related Questions

How to get p-value and pearson's r for a list of columns in Pandas?