Reputation: 334
I have a dataset with 4 variables(Bearing 1 to Bearing 4) and 20152319 no of observations. It looks like this:
Now, I am trying to find the correlation matrix of the 4 variables. The code I use is this:
corr_mat = Data.corr(method = 'pearson')
print(corr_mat)
However in the result, I get the correlation information for only Bearing 2 to Bearing 4. Bearing 1 is nowhere to be seen. I am providing a snapshot of the result down below:
I have tried removing NULL values from each of the variables and also tried looking for missing values but nothing works. What is interesting is that, if I isolate the first two variables (Bearing 1 and Bearing 2) and then try to find the correlation matrix between them, Bearing 1 does not come up and the matrix is a 1x1 matrix with only Bearing 2
Any explanation on why this occurs and how to solve it would be appreciated.
Upvotes: 0
Views: 1435
Reputation: 862406
Dtype
of first column is object
, so pandas by default omit it. Solution is convert it to numeric:
Data['Bearing 1'] = Data['Bearing 1'].astype(float)
Or if some non numeric values use to_numeric
with errors='coerce'
for parse these values to NaN
s:
Data['Bearing 1'] = pd.to_numeric(Data['Bearing 1'], errors='coerce')
If want convert all columns to numeric:
Data = Data.astype(float)
Or:
Data = Data.apply(pd.to_numeric, errors='coerce')
Upvotes: 1
Reputation: 121
Try to see if the first column 'Bearing 1' is numeric.
Data.dtypes # This will show the type of each column
cols = Data.columns # Saving column names to a variable
Data[cols].apply(pd.to_numeric, errors='coerce') # Converting the columns to numeric
Now apply your Calculations,
corr_mat = Data.corr(method = 'pearson')
print(corr_mat)
Upvotes: 2