Sam CD
Sam CD

Reputation: 2097

Create Time Series Sequence With Duplicates

I have time series data such as

time    action
------------------
10:00   move left
09:45   move right
09:40   move up
09:40   move away
09:35   move down

How can I create a new field that shows the sequence of actions, but if two actions have the same timestamp, they should have the same sequence number, such as:

time    action        seqNo
----------------------------
10:00   move left     1
09:45   move right    2
09:40   move up       3
09:40   move away     3
09:35   move down     4

Upvotes: 1

Views: 30

Answers (1)

ALollz
ALollz

Reputation: 59519

You can rank the time column. If your times are all well formatted you can convert to a timedelta, otherwise pd.to_datetime is more flexible and will just use today's date to fill the default value. Since every row gets the same it wont matter.

df['SeqNo'] = pd.to_timedelta('00:'+df.time).rank(ascending=False, method='dense').astype(int)
#df['seqNo'] = pd.to_datetime(df.time).rank(ascending=False, method='dense').astype(int)

    time      action  seqNo
0  10:00   move_left      1
1  09:45  move_right      2
2  09:40     move_up      3
3  09:40   move_away      3
4  09:35   move_down      4

If your rows are guaranteed to be in order, create a Boolean Series that checks if the time is not the same as the previous row then cumsum. No need for any date conversions.

df['seqNo'] = df['time'].ne(df['time'].shift()).cumsum()

Upvotes: 2

Related Questions