Reputation: 2097
I have time series data such as
time action
------------------
10:00 move left
09:45 move right
09:40 move up
09:40 move away
09:35 move down
How can I create a new field that shows the sequence of actions, but if two actions have the same timestamp, they should have the same sequence number, such as:
time action seqNo
----------------------------
10:00 move left 1
09:45 move right 2
09:40 move up 3
09:40 move away 3
09:35 move down 4
Upvotes: 1
Views: 30
Reputation: 59519
You can rank
the time column. If your times are all well formatted you can convert to a timedelta
, otherwise pd.to_datetime
is more flexible and will just use today's date to fill the default value. Since every row gets the same it wont matter.
df['SeqNo'] = pd.to_timedelta('00:'+df.time).rank(ascending=False, method='dense').astype(int)
#df['seqNo'] = pd.to_datetime(df.time).rank(ascending=False, method='dense').astype(int)
time action seqNo
0 10:00 move_left 1
1 09:45 move_right 2
2 09:40 move_up 3
3 09:40 move_away 3
4 09:35 move_down 4
If your rows are guaranteed to be in order, create a Boolean Series that checks if the time is not the same as the previous row then cumsum
. No need for any date conversions.
df['seqNo'] = df['time'].ne(df['time'].shift()).cumsum()
Upvotes: 2