nilsinelabore
nilsinelabore

Reputation: 5095

Correlation heatmap turned values into nan in Python

I want to conduct a heatmap on my table df, which looks normal at the beginning:

    Total   Paid Post Engaged   Negative    like 
1   2178    0    0    66        0           1207
2   1042    0    0    60        0           921
3   2096    0    0    112       0           1744
4   1832    0    0    109       0           1718
5   1341    0    0    38        0           889
6   1933    0    0    123       0           1501
    ...

but after I applied:

df= full_Data.iloc[1:,4:10]
df= pd.DataFrame(df,columns=['A','B','C', 'D', 'E', 'F'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()

it returned an empty graph:

C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:204: RuntimeWarning: All-NaN slice encountered
  vmin = np.nanmin(calc_data)
C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:209: RuntimeWarning: All-NaN slice encountered
  vmax = np.nanmax(calc_data)

enter image description here

and df returned:

    A   B   C   D   E   F
1   nan nan nan nan nan nan
2   nan nan nan nan nan nan
3   nan nan nan nan nan nan
4   nan nan nan nan nan nan
5   nan nan nan nan nan nan
    ...

Why all the values are turned into nan?


Update:

Tried to convert df without naming column in the old way:

df.columns = ['A','B','C', 'D', 'E', 'F']

and

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

and both caught error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-3a27f095066b> in <module>
     12 
     13 corrMatrix = df.corr()
---> 14 sn.heatmap(corrMatrix, annot=True)
     15 plt.show()
     16 

~\Anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~\Anaconda3\lib\site-packages\seaborn\matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
    545     plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
    546                           annot_kws, cbar, cbar_kws, xticklabels,
--> 547                           yticklabels, mask)
    548 
    549     # Add the pcolormesh kwargs here

~\Anaconda3\lib\site-packages\seaborn\matrix.py in __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
    164         # Determine good default values for the colormapping
    165         self._determine_cmap_params(plot_data, vmin, vmax,
--> 166                                     cmap, center, robust)
    167 
    168         # Sort out the annotations

~\Anaconda3\lib\site-packages\seaborn\matrix.py in _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
    202                 vmin = np.nanpercentile(calc_data, 2)
    203             else:
--> 204                 vmin = np.nanmin(calc_data)
    205         if vmax is None:
    206             if robust:

<__array_function__ internals> in nanmin(*args, **kwargs)

~\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py in nanmin(a, axis, out, keepdims)
    317         # Fast, but not safe for subclasses of ndarray, or object arrays,
    318         # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
--> 319         res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
    320         if np.isnan(res).any():
    321             warnings.warn("All-NaN slice encountered", RuntimeWarning,

ValueError: zero-size array to reduction operation fmin which has no identity

Upvotes: 3

Views: 1367

Answers (2)

gilf0yle
gilf0yle

Reputation: 1102

Why all the values are turned into nan?

This happened because there are 3 columns in your dataframe whose values are 0 throughout try dropping those columns and you will see nan problem to be resolved

  df

A   B   C   D   E   F
0   2178    1   2   6   0   1207
1   2178    3   3   66  1   1207
2   178 0   0   66  45  17
3   21  2   0   66  0   1207
4   2178    0   0   66  0   1207
5   2178    0   0   66  0   1207

df.corr()

A   B   C   D   E   F
A   1.000000    -0.023103   0.485124    -0.315890   -0.591457   0.595989
B   -0.023103   1.000000    0.713746    0.000000    -0.371648   0.387298
C   0.485124    0.713746    1.000000    -0.430007   -0.290624   0.307148
D   -0.315890   0.000000    -0.430007   1.000000    0.205308    -0.200000
E   -0.591457   -0.371648   -0.290624   0.205308    1.000000    -0.999761
F   0.595989    0.387298    0.307148    -0.200000   -0.999761   1.000000

where as in you case

df
A   B   C   D   E   F
0   2178    0   0   66  0   1207
1   2178    0   0   66  0   1207
2   2178    0   0   66  0   1207
3   2178    0   0   66  0   1207
4   2178    0   0   66  0   1207
5   2178    0   0   66  0   1207

df.corr()

A   B   C   D   E   F
A   NaN NaN NaN NaN NaN NaN
B   NaN NaN NaN NaN NaN NaN
C   NaN NaN NaN NaN NaN NaN
D   NaN NaN NaN NaN NaN NaN
E   NaN NaN NaN NaN NaN NaN
F   NaN NaN NaN NaN NaN NaN```

and rest is as answered by jezrael.

Upvotes: 0

jezrael
jezrael

Reputation: 862601

I think problem is passed object DataFrame to pd.DataFrame constructor, so there are different original columns names and new columns names from list, so only NaNs are created.

Solution is convert it to numpy array:

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

Or set new columns names in next step without DataFrame constructor:

df = full_Data.iloc[1:,4:10]
df.columns = ['A','B','C', 'D', 'E', 'F']

Solution create dict by existing columns only:

old = df.columns
new = ['A','B','C', 'D', 'E', 'F']

df = df.rename(columns=dict(zip(old, new)))
print (df)
      A  B  C    D  E     F
1  2178  0  0   66  0  1207
2  1042  0  0   60  0   921
3  2096  0  0  112  0  1744
4  1832  0  0  109  0  1718
5  1341  0  0   38  0   889
6  1933  0  0  123  0  1501

print (df.corr())
          A   B   C         D   E         F
A  1.000000 NaN NaN  0.606808 NaN  0.727034
B       NaN NaN NaN       NaN NaN       NaN
C       NaN NaN NaN       NaN NaN       NaN
D  0.606808 NaN NaN  1.000000 NaN  0.916325
E       NaN NaN NaN       NaN NaN       NaN
F  0.727034 NaN NaN  0.916325 NaN  1.000000

EDIT:

Problem was columns was not numeric.

df = df.astype(int)

Or:

df = df.apply(pd.to_numeric, errors='coerce')

Upvotes: 3

Related Questions