Reputation: 67
I have a column that calculates the duration of seconds it takes from A to B in format hh:mm:ss. However, A and B may show null values in the data.
Let's say A=05:15:00 and B=naT, then the subtraction will return 5:15 seconds which is misleading and wrong in context due to B being infinity! How can I specify to only subtract columns B from A if both columns are NOT NULL?
This is the code I have:
df['A_to_B']=(df.B-df.A).dt.total_seconds()
Upvotes: 0
Views: 1465
Reputation: 150735
You can do:
df['A_to_B'] = np.where(df['A'].notna() & df['B'].notna(),
(df['A'] - df['B']).dt.total_seconds(),
np.nan)
Sample data
A B
0 05:15:00 NaT
1 NaT 00:00:15
2 05:15:00 00:15:00
Output:
A B A_to_B
0 05:15:00 NaT NaN
1 NaT 00:00:15 NaN
2 05:15:00 00:15:00 18000.0
Upvotes: 0
Reputation: 36
Python does not use null, but it does use a type called None to represent the absence of a value / type. So you would check if df.B and df.A are both not None, perhaps like this:
if (df.A is not None) and (df.B is not None):
df['A_to_B'] = (df.B-df.A).dt.total_seconds()
Upvotes: 1