luser3228561
luser3228561

Reputation: 37

Calculating new column value in dataframe based on next row's column value

My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.

I have a dataframe of events that has a 'start_time' column. What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time. This is an ask for doing event time calculations.

The desired output:

start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08  09:15:03

I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.

for ndx, row in df.iterrows():
    if ndx != df[atnp_df.columns[0]].count() - 1:
        df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)

(Hey, it works, but it's slow...)

How do I do this pythonically? I know the solution should be something like this:

df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)

But, this subtracts 1 second from the start_time in the same row. I'm not quite sure how to access the next row's start time in this way.

Any and all help is greatly appreciated!

Upvotes: 1

Views: 216

Answers (1)

piRSquared
piRSquared

Reputation: 294478

offsets

df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))

  start_time        end_time
0   00:00:00 0 days 07:59:59
1   08:00:00 0 days 08:20:04
2   08:20:05 0 days 08:29:19
3   08:29:20 0 days 08:29:20
4   08:29:21 0 days 08:35:14
5   08:35:15 0 days 08:55:21
6   08:55:22 0 days 08:57:20
7   08:57:21 0 days 09:02:23
8   09:02:24 0 days 09:14:07
9   09:14:08             NaT

A little cleaned up and returning formatted strings:

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))

  start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08       NaT
​

Upvotes: 4

Related Questions