Reputation: 2325
I have a bigger dataframe than what I'm showing here but what I'm trying to do is wherever there is certain value in a series (or even better the whole datarame) to change that value to a None. I need these to be None so I can write the dataframe to a database and it will be recognised as null.
series = (['2014/06/05 13:03:56', '2014/07/23 13:03:56', None, '2014/08/21 13:03:56'])
data = pd.DataFrame(series)
0 2014/06/05 13:03:56
1 2014/07/23 13:03:56
2 None
3 2014/08/21 13:03:56
data = pd.to_datetime(data[0], coerce=True)
data
0 2014-06-05 13:03:56
1 2014-07-23 13:03:56
2 NaT
3 2014-08-21 13:03:56
Name: 0, dtype: datetime64[ns]
data = data.map(str)
data
0 2014-06-05 13:03:56
1 2014-07-23 13:03:56
2 NaT
3 2014-08-21 13:03:56
Name: 0, dtype: object
data.replace(to_replace='NaT', value=None)
0 2014-06-05 13:03:56
1 2014-07-23 13:03:56
2 2014-07-23 13:03:56
3 2014-08-21 13:03:56
Name: 0, dtype: object
In the above example, when I try to replace 'NaT' the dataframe actually fills the value with preceeding value and not None. This won't help as it needs to be None. In the actual dataframe I'm working with this usually throws up a type error telling me that I can't do replace None with method pad. I'm using a datetime series here but really I'll need this for more than just datetime series. It seems like it should be basic functionality with pandas but I can't find an answer.
Thanks, Colin
Upvotes: 3
Views: 6281
Reputation: 2325
In the end, this worked for my needs. I don't think map(str) was working in the code I gave for the question but it works below where I assign it to only one field instead of the whole dataframe.
def change_date_to_string(field):
data[field] = data[field].map(str)
data[field].loc[data[field] == 'NaT'] = None
change_date_to_string(field='usr_datetime')
Upvotes: 0
Reputation: 366213
First, the reason your code isn't working is that those NaT
values aren't the string 'NaT'
, they're the value pd.NaT
. But since I don't think fixing that would actually give you what you want, let's ignore that for the moment.
A Pandas DataFrame
, like the NumPy ndarray
it's built on, is a compact array of typed, low-level values. That's what makes it small, fast, and type-safe. But that inherently means it can only store values of the specified type. And None
is not a value of the datetime64[ns]
type.
More specifically, a datetime64[ns]
can only hold 64-bit integers that represent datetimes as nanoseconds since the epoch, and None
is not a 64-bit integer.
Pandas does have a special value to deal with this, called NaT
, for "Not a Time"; that's the closest thing to None
that you can get in any kind of datetime field (just like the more familiar NaN
for floats). And that's what you've already got.
Meanwhile, Pandas has special support for None
in various different places that tries to do what you often want—storing NaN/NaT/0, or repeating the last value, or various other things. But when that's not what you want, that isn't much help.
If you actually need to have None
, the only way to do that is to store normal boxed-up Python objects instead of typed low-level values, which you do by using dtype=object
. Then you can stick any Python value into any element, including None
, of course.
But doing that largely defeats the purpose of using Pandas and NumPy. You're probably better off using NaT
, and changing the rest of your code—either to expect NaT
where you're currently expecting None
, or to wrap the DataFrame
in something that converts NaT
values to None
during extraction or printout.
For completeness, if you want to get crazy, there's nothing stopping you from defining an optionaldatetime64[ns]
dtype that's just like datetime64[ns]
except that it uses the special value reserved for NaT
to mean None
instead. Or it could even reserve another special value, or a whole bit, to mean None
, while leaving NaT
alone. Either way, this would be a lot of work, and it would completely break any operations that depend on datetime arithmetic (d - NaT == NaT
for any d
, but d - None
is a TypeError
for any d
…), and ultimately it's no better than the wrapper solution for any purpose I can think of…
Upvotes: 5