Reputation: 13
so my problem lies in preparing a DataFrame for creating a heatmap using pandas and seaborn. My question is if there is to keep the NaN values as NaN while converting everything from an object to an integer so I can plot it doing something like sns.heatmap(df, mask = df.isnull())
What I am doing so far is entering data into a new DataFrame that I created that looks like this (https://i.sstatic.net/hS4xX.jpg) upon creation.
From there I insert the values into the new DataFrame using code that looks like:
start = 16
end = start + 10
dates = range(start,end)
for d in dates:
str(d)
for i, row in jfk10day.iterrows():
row[f'Apr/{d}/2019'] = jfk[jfk['Pick-up Date'] == f'Apr/{d}/2019'][jfk['Supplier']==i][jfk['Car Type'] == 'Compact']['Total Price'].min()
Which enters the data into the dataframe as type object. This completed dataframe looks like https://i.sstatic.net/oQXen.jpg.
Now from here I know that I need to change the datatype to int/float in order to plot it using sns.heatmap(), but when I try something like:
jfk10day = jfk10day.astype(int)
I get the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-76-45dab2567d52> in <module>
----> 1 jfk10day.astype(int)
/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
176 else:
177 kwargs[new_arg_name] = new_arg_value
--> 178 return func(*args, **kwargs)
179 return wrapper
180 return _deprecate_kwarg
/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
4999 # else, only a single dtype is given
5000 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5001 **kwargs)
5002 return self._constructor(new_data).__finalize__(self)
5003
/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py in astype(self, dtype, **kwargs)
3712
3713 def astype(self, dtype, **kwargs):
-> 3714 return self.apply('astype', dtype=dtype, **kwargs)
3715
3716 def convert(self, **kwargs):
/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
3579
3580 kwargs['mgr'] = self
-> 3581 applied = getattr(b, f)(**kwargs)
3582 result_blocks = _extend_blocks(applied, result_blocks)
3583
/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py in astype(self, dtype, copy, errors, values, **kwargs)
573 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
574 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 575 **kwargs)
576
577 def _astype(self, dtype, copy=False, errors='raise', values=None,
/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs)
662
663 # _astype_nansafe works fine with 1-d only
--> 664 values = astype_nansafe(values.ravel(), dtype, copy=True)
665 values = values.reshape(self.shape)
666
/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy)
707 # work around NumPy brokenness, #1987
708 if np.issubdtype(dtype.type, np.integer):
--> 709 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
710
711 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
pandas/_libs/src/util.pxd in util.set_value_at_unsafe()
ValueError: cannot convert float NaN to integer
So I am wondering if there is a way to edit my for loop so that every entry is entered as an int (the original dataframe 'Total Price' is already int), or if there is a way to convert the new dataframe to type int while skipping over the NaN values. I need the NaN values in the heatmap to show that the supplier is not offering anything on that specific date.
Thanks in advance for the help guys, and if there is any more information needed from me please let me know!
Upvotes: 0
Views: 1064
Reputation: 42886
Since pandas version 0.24.0 we have nullable integer
data type:
df = pd.DataFrame({'Col':[1.0, 2.0, 3.0, np.NaN]})
print(df)
Col
0 1.0
1 2.0
2 3.0
3 NaN
print(df.Col.astype('Int64'))
0 1
1 2
2 3
3 NaN
Name: Col, dtype: Int64
Upvotes: 2