Reputation: 1746
NaT missing values are appearing at the end of my dataframe as demonstrated below. This understandably raises the ValueError:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pytz/tzinfo.py", line 314, in localize loc_dt = tzinfo.normalize(dt.replace(tzinfo=tzinfo)) ValueError: month must be in 1..12
I've tried to use both dropna:
data[col_name].dropna(0, inplace=True)
and fillna, as encouraged by the Working with Missing Data section:
data[col_name].fillna(0, inplace=True)
Before either of these lines, I tried to clean the data by replacing non-datetimes with the epoch time:
data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime) else epoch)
Because NaT is technically a datetime this condition wasn't covered by that function. Since isnull will handle this, I wrote this function to apply to data[col_name]:
def replace_time(x):
if pd.isnull(x):
return epoch
elif isinstance(x, datetime.datetime):
return x
else:
return epoch
Despite the fact that it enters the pd.isnull section, the value isn't changed. However, when I try that function on this series (where the second value is NaT) it works:
s = pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')],dtype='M8[ns]')
Data:
2003-04-29 00:00:00
NaT
NaT
NaT
Upvotes: 1
Views: 4088
Reputation: 128978
The main issue here is you are chain indexing via this expression
data[col_name].dropna(0, inplace=True)
This potentially modifies a copy and thus nothing will actually change. This is quite tricky to actually have it show a SettingWithCopy
warning. See the docs here
.fillna/.dropna
ARE the appropriate ways to fill datetime64[ns]
dtypes. Using an .apply
is quite inefficient.
In [16]: df = DataFrame({ 'date' : pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')]) })
In [17]: df
Out[17]:
date
0 2013-01-01 00:00:00
1 NaT
2 2013-01-02 09:30:00
In [18]: df.date.fillna(0)
Out[18]:
0 2013-01-01 00:00:00
1 1970-01-01 00:00:00
2 2013-01-02 09:30:00
Name: date, dtype: datetime64[ns]
Upvotes: 2
Reputation: 109546
Try:
data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime)
and not isinstance(x, pd.tslib.NaTType) else epoch)
Upvotes: 2