Reputation: 581
I'm trying to make a multiindexed table (a matrix) of correlation coefficients and p-values. I'd prefer to use the scipy.stats
tests.
x = pd.DataFrame(
list(
zip(
[1,2,3,4,5,6], [5, 7, 8, 4, 2, 8], [13, 16, 12, 11, 9, 10]
)
),
columns= ['a', 'b', 'c']
)
# I've tried something like this
for i in range(len(x.columns)):
r,p = pearsonr(x[x.columns[i]], x[x.columns[i+1]])
print(f'{r}, {p}')
Obviously the for loop
won't work. What I want to end up with is:
a | b | c | ||
---|---|---|---|---|
a | r | 1.0 | -.09 | -.8 |
p | .00 | .87 | .06 | |
b | r | -.09 | 1 | .42 |
p | .87 | .00 | .41 | |
c | r | -.8 | .42 | 1 |
p | .06 | .41 | 00 |
I had written code to solve this problem (with help from this community) years ago, but it only worked for an older version of spearmanr
.
Any help would be very much appreciated.
Upvotes: 1
Views: 597
Reputation: 13488
Here is one way to do it using scipy pearsonr and Pandas corr methods:
import pandas as pd
from scipy.stats import pearsonr
def pearsonr_pval(x, y):
return pearsonr(x, y)[1]
df = (
pd.concat(
[
x.corr(method="pearson").reset_index().assign(value="r"),
x.corr(method=pearsonr_pval).reset_index().assign(value="p"),
]
)
.groupby(["index", "value"])
.agg(lambda x: list(x)[0])
).sort_index(ascending=[True, False])
df.index.names = ["", ""]
Then:
print(df)
# Output
a b c
a r 1.000000 -0.088273 -0.796421
p 1.000000 0.867934 0.057948
b r -0.088273 1.000000 0.421184
p 0.867934 1.000000 0.405583
c r -0.796421 0.421184 1.000000
p 0.057948 0.405583 1.000000
Upvotes: 1