Reputation: 768
I have two columns: "asked" and "answered", but "answered' is an object while "asked" is a datetime64[ns]. So I convert 'answered' to Datetime:
df['answered'] = pd.to_datetime(df['answered'])
index, asked, answered
0 2016-07-04 07/07/2016
1 2016-07-03 07/01/2016
2 2016-07-05 07/09/2016
3 NaT NaN
Then, I made a 3rd column that gives me the difference in time between the two:
df['Days']= df['answered'] - df['asked']
index, asked, answered, Days
0 2016-07-04 07/07/2016 3 days
1 2016-07-03 07/01/2016 -2 days
2 2016-07-05 07/09/2016 4
3 NaT NaN NaT
With the help of @piRSquared, I was trying to turn the negative Days into a NaT, but nothing happened when I did this:
df.update(df[['Days']].mask(df < 0))
How can I turn the negative days to a NaT?
Upvotes: 2
Views: 1069
Reputation: 862771
For me works comapre Series
(columns) by 0 Timedelta
, then create NaT
by Series.mask
or loc
:
mask = df['Days'] < pd.Timedelta(0)
df['Days'] = df['Days'].mask(mask)
print (df)
asked answered Days
0 2016-07-04 2016-07-07 3 days
1 2016-07-03 2016-07-01 NaT
2 2016-07-05 2016-07-09 4 days
3 NaT NaT NaT
Or:
mask = df['Days'] < pd.Timedelta(0)
df.loc[mask, 'Days'] = np.nan
print (df)
asked answered Days
0 2016-07-04 2016-07-07 3 days
1 2016-07-03 2016-07-01 NaT
2 2016-07-05 2016-07-09 4 days
3 NaT NaT NaT
But if compare 0 Timedelta
with DataFrame
it is buggy:
print (df)
asked answered Days Days2
0 2016-07-04 2016-07-07 3 days 3 days
1 2016-07-03 2016-07-01 -2 days -2 days
2 2016-07-05 2016-07-09 4 days 4 days
3 NaT NaT NaT NaT
df1 = df.select_dtypes([np.timedelta64])
#return wrong mask
m1 = df1 < pd.Timedelta(0)
print (m1)
Days Days2
0 False False
1 False False
2 False False
3 True True
#if comapre with apply by Series it works
m2 = df1.apply(lambda x: x < pd.Timedelta(0))
print (m2)
Days Days2
0 False False
1 True True
2 False False
3 False False
#compare numpy array works but warning
m3 = df1.values < np.array(0, dtype=np.timedelta64)
print (m3)
[[False False]
[ True True]
[False False]
[ True True]]
FutureWarning: In the future, 'NAT < x' and 'x < NAT' will always be False.
df[df1.columns] = df1.mask(m2)
print (df)
asked answered Days Days2
0 2016-07-04 2016-07-07 3 days 3 days
1 2016-07-03 2016-07-01 NaT NaT
2 2016-07-05 2016-07-09 4 days 4 days
3 NaT NaT NaT NaT
Upvotes: 1
Reputation: 294338
Use mask
df.mask(df < 0)
Days col2
index
0 20 days NaT
1 61 days 78 days
2 NaT 10 days
pd.DataFrame.mask
takes an array of booleans that identify the positions to mask. If the optional second parameter is passed, it will replace the values in the identified positions with the value(s) specified in the optional parameter. If that parameter is not passed, as in my proposed solution, then the identified values are replaced with a null value. Since the dtypes
of these columns are timedelta
the appropriate null value will be NaT
.
Suppose your dataframe consist of many columns. You can focus on just the specific columns you care about.
df[['Days', 'col2']].mask(df < 0)
And then you can update df inplace with
df.update(df[['Days', 'col2']].mask(df < 0))
Suppose you wanted to grab all columns that were timedelta
df.select_dtypes([np.timedelta]).mask(df < 0)
and update
df.update(df.select_dtypes([np.timedelta64]).mask(df < 0))
Upvotes: 4