Correlation heatmap turned values into nan in Python

Question

I want to conduct a heatmap on my table df, which looks normal at the beginning:

    Total   Paid Post Engaged   Negative    like 
1   2178    0    0    66        0           1207
2   1042    0    0    60        0           921
3   2096    0    0    112       0           1744
4   1832    0    0    109       0           1718
5   1341    0    0    38        0           889
6   1933    0    0    123       0           1501
    ...

but after I applied:

df= full_Data.iloc[1:,4:10]
df= pd.DataFrame(df,columns=['A','B','C', 'D', 'E', 'F'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()

it returned an empty graph:

C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:204: RuntimeWarning: All-NaN slice encountered
  vmin = np.nanmin(calc_data)
C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:209: RuntimeWarning: All-NaN slice encountered
  vmax = np.nanmax(calc_data)

and df returned:

    A   B   C   D   E   F
1   nan nan nan nan nan nan
2   nan nan nan nan nan nan
3   nan nan nan nan nan nan
4   nan nan nan nan nan nan
5   nan nan nan nan nan nan
    ...

Why all the values are turned into nan?

Update:

Tried to convert df without naming column in the old way:

df.columns = ['A','B','C', 'D', 'E', 'F']

and

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

and both caught error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
     12 
     13 corrMatrix = df.corr()
---> 14 sn.heatmap(corrMatrix, annot=True)
     15 plt.show()
     16 

~\Anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~\Anaconda3\lib\site-packages\seaborn\matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
    545     plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
    546                           annot_kws, cbar, cbar_kws, xticklabels,
--> 547                           yticklabels, mask)
    548 
    549     # Add the pcolormesh kwargs here

~\Anaconda3\lib\site-packages\seaborn\matrix.py in __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
    164         # Determine good default values for the colormapping
    165         self._determine_cmap_params(plot_data, vmin, vmax,
--> 166                                     cmap, center, robust)
    167 
    168         # Sort out the annotations

~\Anaconda3\lib\site-packages\seaborn\matrix.py in _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
    202                 vmin = np.nanpercentile(calc_data, 2)
    203             else:
--> 204                 vmin = np.nanmin(calc_data)
    205         if vmax is None:
    206             if robust:

<__array_function__ internals> in nanmin(*args, **kwargs)

~\Anaconda3\lib\site-packages
umpy\lib
anfunctions.py in nanmin(a, axis, out, keepdims)
    317         # Fast, but not safe for subclasses of ndarray, or object arrays,
    318         # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
--> 319         res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
    320         if np.isnan(res).any():
    321             warnings.warn("All-NaN slice encountered", RuntimeWarning,

ValueError: zero-size array to reduction operation fmin which has no identity

jezrael · Accepted Answer

I think problem is passed object DataFrame to pd.DataFrame constructor, so there are different original columns names and new columns names from list, so only NaNs are created.

Solution is convert it to numpy array:

df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])

Or set new columns names in next step without DataFrame constructor:

df = full_Data.iloc[1:,4:10]
df.columns = ['A','B','C', 'D', 'E', 'F']

Solution create dict by existing columns only:

old = df.columns
new = ['A','B','C', 'D', 'E', 'F']

df = df.rename(columns=dict(zip(old, new)))
print (df)
      A  B  C    D  E     F
1  2178  0  0   66  0  1207
2  1042  0  0   60  0   921
3  2096  0  0  112  0  1744
4  1832  0  0  109  0  1718
5  1341  0  0   38  0   889
6  1933  0  0  123  0  1501

print (df.corr())
          A   B   C         D   E         F
A  1.000000 NaN NaN  0.606808 NaN  0.727034
B       NaN NaN NaN       NaN NaN       NaN
C       NaN NaN NaN       NaN NaN       NaN
D  0.606808 NaN NaN  1.000000 NaN  0.916325
E       NaN NaN NaN       NaN NaN       NaN
F  0.727034 NaN NaN  0.916325 NaN  1.000000

EDIT:

Problem was columns was not numeric.

df = df.astype(int)

Or:

df = df.apply(pd.to_numeric, errors='coerce')

Correlation heatmap turned values into nan in Python

Answers (2)

Related Questions