Reputation: 4255
If we have:
X = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34]})
Y = pd.DataFrame({"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65]})
How do we calculate Spearman's Rank Correlation between the two datasets (but not within each dataset), so that in the end we have a 5x5 matrix? Like this:
A B C D E
A . . . . .
B . . . . .
C . . . . .
D . . . . .
E . . . . .
Upvotes: 1
Views: 4126
Reputation: 13800
Using pandas' concat
and corr
function you can turn this into a one liner by putting everything together into one DataFrame
:
import pandas as pd
X = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34]})
Y = pd.DataFrame({"A1":[45,24,65,65,65], "B1":[45,87,65,52,12], "C1":[98,52,32,32,12], "D1":[0,23,1,365,53], "E1":[24,12,65,3,65]})
pd.concat([X,Y], axis=1).corr(method="spearman").iloc[5:,:5]
Note that in my example I gave the second set of columns a different name to make them more easily distinguishable. Using pandas' indexing features you could come up with a more sophisticated way of picking out the desired rows/columns from the correlation table than my .iloc[5:,:5]
, but in this case it works.
EDIT TO ADD RESULTS:
Upvotes: 3
Reputation: 629
This should do the trick! Probably might be made shorter though:
import pandas as pd
import numpy as np
from scipy.stats import linregress
X = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34]})
Y = pd.DataFrame({"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65]})
row = 0
col = 0
m = np.zeros( (len(X), len(Y) ))
for key_x, val_x in X.iteritems():
for key_y, val_y in Y.iteritems():
if( col == 5 ):
col = 0
m[row][col] = linregress(val_x, val_y).rvalue
col += 1
row += 1
print m
To calculate the correlation, I am using linregress, but there are other alternatives such as:
numpy.corrcoef
pandas.DataFrame.corr
And probably some others too ;)
Upvotes: 0