Reputation: 89
I have two pandas data frames which I have taken from only one column and set dates column as index, so now I have two Series instead. I need to find the correlation for those Series.
Here are a few rows fromdfd
:
index change
2018-12-31 -0.86
2018-12-30 0.34
2018-12-27 -0.94
2018-12-26 -1.26
2018-12-25 3.30
2018-12-24 -4.17
and from dfp
:
index change
2018-12-31 0.55
2018-12-30 0.81
2018-12-27 -2.99
2018-12-26 0.50
2018-12-25 3.59
2018-12-24 -3.43
I tried:
correlation=dfp.corr(dfd)
and got the following error:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
Upvotes: 3
Views: 601
Reputation: 26676
Can merge the two dataframes and correlate columns
dfd['date']=pd.to_datetime(dfd['date'])
dfd.set_index(dfd['date'], inplace=True)
dfd.drop(columns=['date'], inplace=True)
dfp['date']=pd.to_datetime(dfp['date'])
dfp.set_index(dfp['date'], inplace=True)
dfp.drop(columns=['date'], inplace=True)
df = pd.merge(dfp,dfd,left_index=True, right_index=True).reset_index()
df
Correlate on two columns change(dfd),(dfp)
df['change(dfp)'].corr(df['change(dfd)'])
Outcome
Upvotes: 1
Reputation: 862711
Problem is dfp
is filled by string repr of numbers, so use Series.astype
for convert to floats:
correlation=dfp.astype(float).corr(dfd.astype(float)
print (correlation)
0.8624789983270312
If some non numeric values solution abaove fail, then use to_numeric
with errors='coerce'
- non numbers are converted to missing values:
correlation=pd.to_numeric(dfp, errors='coerce').corr(dfd)
Upvotes: 5