Adam Schroeder
Adam Schroeder

Reputation: 768

Convert negative datetime to NaT

I have two columns: "asked" and "answered", but "answered' is an object while "asked" is a datetime64[ns]. So I convert 'answered' to Datetime:

df['answered'] = pd.to_datetime(df['answered'])

index,  asked,    answered
0     2016-07-04  07/07/2016
1     2016-07-03  07/01/2016
2     2016-07-05  07/09/2016
3     NaT         NaN

Then, I made a 3rd column that gives me the difference in time between the two:

df['Days']= df['answered'] - df['asked']

 index,     asked,    answered,    Days     
    0     2016-07-04  07/07/2016   3 days
    1     2016-07-03  07/01/2016   -2 days
    2     2016-07-05  07/09/2016   4
    3     NaT         NaN          NaT

With the help of @piRSquared, I was trying to turn the negative Days into a NaT, but nothing happened when I did this:

df.update(df[['Days']].mask(df < 0))

How can I turn the negative days to a NaT?

Upvotes: 2

Views: 1069

Answers (2)

jezrael
jezrael

Reputation: 862771

For me works comapre Series (columns) by 0 Timedelta, then create NaT by Series.mask or loc:

mask = df['Days'] < pd.Timedelta(0)
df['Days'] = df['Days'].mask(mask)
print (df)
       asked   answered   Days
0 2016-07-04 2016-07-07 3 days
1 2016-07-03 2016-07-01    NaT
2 2016-07-05 2016-07-09 4 days
3        NaT        NaT    NaT

Or:

mask = df['Days'] < pd.Timedelta(0)
df.loc[mask, 'Days'] = np.nan
print (df)
       asked   answered   Days
0 2016-07-04 2016-07-07 3 days
1 2016-07-03 2016-07-01    NaT
2 2016-07-05 2016-07-09 4 days
3        NaT        NaT    NaT

But if compare 0 Timedelta with DataFrame it is buggy:

print (df)
       asked   answered    Days   Days2
0 2016-07-04 2016-07-07  3 days  3 days
1 2016-07-03 2016-07-01 -2 days -2 days
2 2016-07-05 2016-07-09  4 days  4 days
3        NaT        NaT     NaT     NaT

df1 = df.select_dtypes([np.timedelta64])

#return wrong mask
m1 = df1 < pd.Timedelta(0)
print (m1)
    Days  Days2
0  False  False
1  False  False
2  False  False
3   True   True

#if comapre with apply by Series it works
m2 = df1.apply(lambda x: x < pd.Timedelta(0))
print (m2)
    Days  Days2
0  False  False
1   True   True
2  False  False
3  False  False

#compare numpy array works but warning 
m3 = df1.values < np.array(0, dtype=np.timedelta64)
print (m3)
[[False False]
 [ True  True]
 [False False]
 [ True  True]]

FutureWarning: In the future, 'NAT < x' and 'x < NAT' will always be False.

df[df1.columns] = df1.mask(m2)
print (df)
       asked   answered   Days  Days2
0 2016-07-04 2016-07-07 3 days 3 days
1 2016-07-03 2016-07-01    NaT    NaT
2 2016-07-05 2016-07-09 4 days 4 days
3        NaT        NaT    NaT    NaT

Upvotes: 1

piRSquared
piRSquared

Reputation: 294338

Use mask

df.mask(df < 0)

         Days    col2
index                
0     20 days     NaT
1     61 days 78 days
2         NaT 10 days

pd.DataFrame.mask takes an array of booleans that identify the positions to mask. If the optional second parameter is passed, it will replace the values in the identified positions with the value(s) specified in the optional parameter. If that parameter is not passed, as in my proposed solution, then the identified values are replaced with a null value. Since the dtypes of these columns are timedelta the appropriate null value will be NaT.


Suppose your dataframe consist of many columns. You can focus on just the specific columns you care about.

df[['Days', 'col2']].mask(df < 0)

And then you can update df inplace with

df.update(df[['Days', 'col2']].mask(df < 0))

Suppose you wanted to grab all columns that were timedelta

df.select_dtypes([np.timedelta]).mask(df < 0)

and update

df.update(df.select_dtypes([np.timedelta64]).mask(df < 0))

Upvotes: 4

Related Questions