sumitpal0593
sumitpal0593

Reputation: 334

Why is the full correlation matrix not getting calculated in Python?

I have a dataset with 4 variables(Bearing 1 to Bearing 4) and 20152319 no of observations. It looks like this:

A snapshot of the dataset I am working on

Now, I am trying to find the correlation matrix of the 4 variables. The code I use is this:

corr_mat = Data.corr(method = 'pearson')
print(corr_mat)

However in the result, I get the correlation information for only Bearing 2 to Bearing 4. Bearing 1 is nowhere to be seen. I am providing a snapshot of the result down below:

Snapshot of the correlation matrix

I have tried removing NULL values from each of the variables and also tried looking for missing values but nothing works. What is interesting is that, if I isolate the first two variables (Bearing 1 and Bearing 2) and then try to find the correlation matrix between them, Bearing 1 does not come up and the matrix is a 1x1 matrix with only Bearing 2

Any explanation on why this occurs and how to solve it would be appreciated.

Upvotes: 0

Views: 1435

Answers (2)

jezrael
jezrael

Reputation: 862406

Dtype of first column is object, so pandas by default omit it. Solution is convert it to numeric:

Data['Bearing 1'] = Data['Bearing 1'].astype(float)

Or if some non numeric values use to_numeric with errors='coerce' for parse these values to NaNs:

Data['Bearing 1'] = pd.to_numeric(Data['Bearing 1'], errors='coerce')

If want convert all columns to numeric:

Data = Data.astype(float)

Or:

Data = Data.apply(pd.to_numeric, errors='coerce')

Upvotes: 1

Bala
Bala

Reputation: 121

Try to see if the first column 'Bearing 1' is numeric.

Data.dtypes # This will show the type of each column

cols = Data.columns # Saving column names to a variable
Data[cols].apply(pd.to_numeric, errors='coerce') # Converting the columns to numeric

Now apply your Calculations,

corr_mat = Data.corr(method = 'pearson')
print(corr_mat)

Upvotes: 2

Related Questions