Reputation: 5095
I want to conduct a heatmap on my table df
, which looks normal at the beginning:
Total Paid Post Engaged Negative like
1 2178 0 0 66 0 1207
2 1042 0 0 60 0 921
3 2096 0 0 112 0 1744
4 1832 0 0 109 0 1718
5 1341 0 0 38 0 889
6 1933 0 0 123 0 1501
...
but after I applied:
df= full_Data.iloc[1:,4:10]
df= pd.DataFrame(df,columns=['A','B','C', 'D', 'E', 'F'])
corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()
it returned an empty graph:
C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:204: RuntimeWarning: All-NaN slice encountered
vmin = np.nanmin(calc_data)
C:\Users\User\Anaconda3\lib\site-packages\seaborn\matrix.py:209: RuntimeWarning: All-NaN slice encountered
vmax = np.nanmax(calc_data)
and df
returned:
A B C D E F
1 nan nan nan nan nan nan
2 nan nan nan nan nan nan
3 nan nan nan nan nan nan
4 nan nan nan nan nan nan
5 nan nan nan nan nan nan
...
Why all the values are turned into nan
?
Update:
Tried to convert df
without naming column in the old way:
df.columns = ['A','B','C', 'D', 'E', 'F']
and
df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])
and both caught error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-43-3a27f095066b> in <module>
12
13 corrMatrix = df.corr()
---> 14 sn.heatmap(corrMatrix, annot=True)
15 plt.show()
16
~\Anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~\Anaconda3\lib\site-packages\seaborn\matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
545 plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
546 annot_kws, cbar, cbar_kws, xticklabels,
--> 547 yticklabels, mask)
548
549 # Add the pcolormesh kwargs here
~\Anaconda3\lib\site-packages\seaborn\matrix.py in __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
164 # Determine good default values for the colormapping
165 self._determine_cmap_params(plot_data, vmin, vmax,
--> 166 cmap, center, robust)
167
168 # Sort out the annotations
~\Anaconda3\lib\site-packages\seaborn\matrix.py in _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
202 vmin = np.nanpercentile(calc_data, 2)
203 else:
--> 204 vmin = np.nanmin(calc_data)
205 if vmax is None:
206 if robust:
<__array_function__ internals> in nanmin(*args, **kwargs)
~\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py in nanmin(a, axis, out, keepdims)
317 # Fast, but not safe for subclasses of ndarray, or object arrays,
318 # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
--> 319 res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
320 if np.isnan(res).any():
321 warnings.warn("All-NaN slice encountered", RuntimeWarning,
ValueError: zero-size array to reduction operation fmin which has no identity
Upvotes: 3
Views: 1367
Reputation: 1102
Why all the values are turned into nan?
This happened because there are 3 columns in your dataframe whose values are 0 throughout try dropping those columns and you will see nan problem to be resolved
df
A B C D E F
0 2178 1 2 6 0 1207
1 2178 3 3 66 1 1207
2 178 0 0 66 45 17
3 21 2 0 66 0 1207
4 2178 0 0 66 0 1207
5 2178 0 0 66 0 1207
df.corr()
A B C D E F
A 1.000000 -0.023103 0.485124 -0.315890 -0.591457 0.595989
B -0.023103 1.000000 0.713746 0.000000 -0.371648 0.387298
C 0.485124 0.713746 1.000000 -0.430007 -0.290624 0.307148
D -0.315890 0.000000 -0.430007 1.000000 0.205308 -0.200000
E -0.591457 -0.371648 -0.290624 0.205308 1.000000 -0.999761
F 0.595989 0.387298 0.307148 -0.200000 -0.999761 1.000000
where as in you case
df
A B C D E F
0 2178 0 0 66 0 1207
1 2178 0 0 66 0 1207
2 2178 0 0 66 0 1207
3 2178 0 0 66 0 1207
4 2178 0 0 66 0 1207
5 2178 0 0 66 0 1207
df.corr()
A B C D E F
A NaN NaN NaN NaN NaN NaN
B NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN NaN NaN
D NaN NaN NaN NaN NaN NaN
E NaN NaN NaN NaN NaN NaN
F NaN NaN NaN NaN NaN NaN```
and rest is as answered by jezrael.
Upvotes: 0
Reputation: 862601
I think problem is passed object DataFrame
to pd.DataFrame
constructor, so there are different original columns names and new columns names from list, so only NaN
s are created.
Solution is convert it to numpy array:
df= pd.DataFrame(df.to_numpy(),columns=['A','B','C', 'D', 'E', 'F'])
Or set new columns names in next step without DataFrame
constructor:
df = full_Data.iloc[1:,4:10]
df.columns = ['A','B','C', 'D', 'E', 'F']
Solution create dict
by existing columns only:
old = df.columns
new = ['A','B','C', 'D', 'E', 'F']
df = df.rename(columns=dict(zip(old, new)))
print (df)
A B C D E F
1 2178 0 0 66 0 1207
2 1042 0 0 60 0 921
3 2096 0 0 112 0 1744
4 1832 0 0 109 0 1718
5 1341 0 0 38 0 889
6 1933 0 0 123 0 1501
print (df.corr())
A B C D E F
A 1.000000 NaN NaN 0.606808 NaN 0.727034
B NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN NaN NaN
D 0.606808 NaN NaN 1.000000 NaN 0.916325
E NaN NaN NaN NaN NaN NaN
F 0.727034 NaN NaN 0.916325 NaN 1.000000
EDIT:
Problem was columns was not numeric.
df = df.astype(int)
Or:
df = df.apply(pd.to_numeric, errors='coerce')
Upvotes: 3