Reputation: 13
I have a DataFrame df with 2000 rows. The first few rows look like the following:
+--------------+----------+--+
| Timestamp | Duration | |
+--------------+----------+--+
| 1/5/18 0:23 | 22 | |
| 1/5/18 3:34 | 32 | |
| 1/5/18 23:48 | 24 | |
| 1/6/18 2:26 | 15 | |
| 1/6/18 14:13 | 23 | |
+--------------+----------+--+
I would like to create a new column C based on the following criteria: if Duration >= 20: then add 20 mins to the timestamp. if Duration < 20: then null.
So, given the above information and data, the new table should look like:
+--------------+----------+--------------+--+--+
| Timestamp | Duration | C | | |
+--------------+----------+--------------+--+--+
| 1/5/18 0:23 | 22 | 1/5/18 0:43 | | |
| 1/5/18 3:34 | 32 | 1/5/18 3:54 | | |
| 1/5/18 23:48 | 24 | 1/6/18 0:08 | | |
| 1/6/18 2:26 | 15 | Null | | |
| 1/6/18 14:13 | 23 | 1/6/18 14:33 | | |
+--------------+----------+--------------+--+--+
Any ideas on how to do that?
Upvotes: 1
Views: 2101
Reputation: 402263
A little datetime arithmetic ought to do it.
df['C'] = pd.to_datetime(df.Timestamp) + pd.to_timedelta(
np.where(df.Duration > 20, 20, np.nan), unit='m')
df['C']
0 2018-01-05 00:43:00
1 2018-01-05 03:54:00
2 2018-01-06 00:08:00
3 NaT
4 2018-01-06 14:33:00
Name: C, dtype: datetime64[ns]
Feel free to format the result as needed:
df['C'] = df['C'].dt.strftime('%m/%d/%y %H:%M')
Note: if your dates start with the dayfirst, change the Timestamp conversion code to
pd.to_datetime(df.Timestamp, dayfirst=True, errors='coerce')
So dates are handled appropriately as having the day first.
Upvotes: 1