Reputation: 145
I am trying to be able to produce a correlation matrix from a pandas dataframe using data from specified columns
Here is my csv data:
col0,col1,col2,col3,col4
122468.9071,1417464.203,3546600,151804924,10839476
14691.1139,170036.0407,103847,19208604,2365065
Here are the two dataframes I created:
df1 = pd.read_csv('c:/temp/test_1.csv', usecols=[0])
df2 = pd.read_csv('c:/temp/test_1.csv', usecols=[1])
I tried the corr and corrwith functions and get the following errors:
Corr Function:
print df1.corr(df2)
Result:
Error: Could not compare ['pearson'] with block values
Corrwith:
print df1.corrwith(df2)
Result:
col0 NaN
col1 NaN
dtype: float64
As you can see, there are no null values in the data set and the float64 should be able to handle decimals.
Any assistance on a solve would be greatly appreciated.
Tiberius
Upvotes: 4
Views: 13476
Reputation: 608
If you are trying to create a correlation matrix between the two columns, I would suggest bringing them into the same dataframe, like so:
df = pd.read_csv('c:/temp/test_1.csv', usecols=[0,1])
df.corr()
I loaded your data into a csv myself and got a 2x2 correlation matrix of all 1s, which is expected.
You can find documentation on the pandas correlation here: http://pandas.pydata.org/pandas-docs/stable/computation.html#correlation
Upvotes: 5