NumenorForLife
NumenorForLife

Reputation: 1746

Replacing NaT with Epoch in Pandas

NaT missing values are appearing at the end of my dataframe as demonstrated below. This understandably raises the ValueError:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pytz/tzinfo.py", line 314, in localize loc_dt = tzinfo.normalize(dt.replace(tzinfo=tzinfo)) ValueError: month must be in 1..12

I've tried to use both dropna:

data[col_name].dropna(0, inplace=True)

and fillna, as encouraged by the Working with Missing Data section:

data[col_name].fillna(0, inplace=True)

Before either of these lines, I tried to clean the data by replacing non-datetimes with the epoch time:

data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime)  else epoch)

Because NaT is technically a datetime this condition wasn't covered by that function. Since isnull will handle this, I wrote this function to apply to data[col_name]:

def replace_time(x):
if pd.isnull(x):
    return epoch
elif isinstance(x, datetime.datetime):
    return x
else:
    return epoch

Despite the fact that it enters the pd.isnull section, the value isn't changed. However, when I try that function on this series (where the second value is NaT) it works:

s = pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')],dtype='M8[ns]')

Data:

2003-04-29 00:00:00

NaT

NaT

NaT

Upvotes: 1

Views: 4088

Answers (2)

Jeff
Jeff

Reputation: 128978

The main issue here is you are chain indexing via this expression

data[col_name].dropna(0, inplace=True)

This potentially modifies a copy and thus nothing will actually change. This is quite tricky to actually have it show a SettingWithCopy warning. See the docs here

.fillna/.dropna ARE the appropriate ways to fill datetime64[ns] dtypes. Using an .apply is quite inefficient.

In [16]: df = DataFrame({ 'date' : pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')]) })

In [17]: df
Out[17]: 
                 date
0 2013-01-01 00:00:00
1                 NaT
2 2013-01-02 09:30:00

In [18]: df.date.fillna(0)
Out[18]: 
0   2013-01-01 00:00:00
1   1970-01-01 00:00:00
2   2013-01-02 09:30:00
Name: date, dtype: datetime64[ns]

Upvotes: 2

Alexander
Alexander

Reputation: 109546

Try:

data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime) 
                                       and not isinstance(x, pd.tslib.NaTType) else epoch)

Upvotes: 2

Related Questions