Reputation: 1623
one = pd.DataFrame(data=[1,2,3,4,5], index=[1,2,3,4,5])
two = pd.DataFrame(data=[5,4,3,2,1], index=[1,2,3,4,5])
one.corr(two)
I think it should return a float = -1.00 but instead it's generating the following error:
TypeError: Could not compare ['pearson'] with block values
Thanks in advance for your help.
Upvotes: 2
Views: 10076
Reputation: 138
You are operating on a DataFrame
when you should be operating on a Series
.
In [1]: import pandas as pd
In [2]: one = pd.DataFrame(data=[1,2,3,4,5], index=[1,2,3,4,5])
In [3]: two = pd.DataFrame(data=[5,4,3,2,1], index=[1,2,3,4,5])
In [4]: one
Out[4]:
0
1 1
2 2
3 3
4 4
5 5
In [5]: two
Out[5]:
0
1 5
2 4
3 3
4 2
5 1
In [6]: one[0].corr(two[0])
Out[6]: -1.0
Why subscript with [0]
? Because that is the name of the column in the DataFrame
, since you didn't give it one. When you reference a column in a DataFrame
, it will return a Series
, which is 1-dimensional. The documentation for this function is here.
Upvotes: 2
Reputation: 330063
pandas.DataFrame.corr
computes pairwise correlation between the columns of a single data frame. What you need here is pandas.DataFrame.corrwith
:
>>> one.corrwith(two)
0 -1
dtype: float64
Upvotes: 7