Reputation: 59544
I just upgrade from Pandas 0.11 to 0.13.0rc1. The upgration caused one error related to Series.fillna().
>>> df
sales net_pft
STK_ID RPT_Date
600809 20060331 5.8951 1.1241
20060630 8.3031 1.5464
20060930 11.9084 2.2990
20061231 NaN 2.6060
20070331 5.9129 1.3334
[5 rows x 2 columns]
>>> type(df['sales'])
<class 'pandas.core.series.Series'>
>>> df['sales'] = df['sales'].fillna(df['net_pft'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python27\lib\site-packages\pandas\core\generic.py", line 1912, in fillna
obj.fillna(v, inplace=True)
AttributeError: 'numpy.float64' object has no attribute 'fillna'
>>>
Why df['sales']
become 'numpy.float64'
object when it is used in fillna()
? How to correctly do "fill the NaN of one column with the other column's value" ?
Upvotes: 4
Views: 12146
Reputation: 139262
There was a recent discussion on this, and it is fixed in pandas master: https://github.com/pydata/pandas/issues/5703 (after the release of 0.13rc1, so it will be fixed in final 0.13).
Note: the behaviour changed! This was not supported behaviour in pandas <= 0.12, as @behzad.nouri points out (using a Series as input to fillna
). However it did work but was apparantly based on the location, which was wrong. But as long as both serieses (df['sales']
and df['net_pft']
in you case) have the same index, this will not matter.
In pandas 0.13, it will be supported but based on the index of the Series. See comment here: https://github.com/pydata/pandas/issues/5703#issuecomment-30663525
Upvotes: 3
Reputation: 78011
it seems more like what you are trying to do is:
idx = df['sales'].isnull( )
df['sales'][ idx ] = df['net_pft'][ idx ]
because what you are providing as value
argument to fillna
is a series, the code goes into the bellow branch which calls fillna
for every index item of the provided series. If self
was a DataFrame this would have worked correctly, that is it would fillna
each column using the provided series, but since self
here is a Series it will break.
As in the documentation to fillna
a DataFrame the parameter value can be
alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled).
from the source code below, if value
is a Series it will work the same way as a dict using the Series' index as keys to fillna
corresponding columns.
else: # value is not None
if method is not None:
raise ValueError('cannot specify both a fill method and value')
if len(self._get_axis(axis)) == 0:
return self
if isinstance(value, (dict, com.ABCSeries)):
if axis == 1:
raise NotImplementedError('Currently only can fill '
'with dict/Series column '
'by column')
result = self if inplace else self.copy()
for k, v in compat.iteritems(value):
if k not in result:
continue
obj = result[k]
obj.fillna(v, inplace=True)
return result
else:
new_data = self._data.fillna(value, inplace=inplace,
downcast=downcast)
Upvotes: 1