Reputation: 515
I have following table in Pandas:
Date System Action
20.08.2017 A 9:01:01
20.08.2017 B 9:05:11
20.08.2017 B 9:05:11
20.08.2017 B 9:10:00
20.08.2017 B 9:05:11
21.08.2017 A 8:03:05
21.08.2017 A 8:03:05
22.09.2017 A 8:05:00
22.09.2017 B 9:05:11
22.09.2017 B 9:05:11
22.09.2017 B 9:05:11
I would like to find duplicities in “Action time” but only for the same system and date. And than mark them to Seq column. Thus results should look like this:
Date System Action Seq
20.08.2017 A 9:01:01 1
20.08.2017 B 9:05:11 1
20.08.2017 B 9:05:11 2
20.08.2017 B 9:10:00 1
20.08.2017 B 9:05:11 3
21.08.2017 A 8:03:05 1
21.08.2017 A 8:03:05 2
22.09.2017 A 8:05:00 1
22.09.2017 B 9:05:11 1
22.09.2017 B 9:05:11 2
22.09.2017 B 9:05:11 3
The first occurrence of time for system “A” in new day has “1”. All following entries for the same day and same system have +1. The counter is separate for all days and all systems.
Can you please hint me with some pythonic way how to do this? All I can think of is a loop over all data. But the data set is quite large…
Upvotes: 1
Views: 45
Reputation: 402583
This looks like a job for df.groupby
followed by dfGroupBy.cumcount
:
In [1018]: df.assign(Seq=df.groupby(['Date', 'System', 'Action']).cumcount() + 1)
Out[1018]:
Date System Action Seq
0 20.08.2017 A 9:01:01 1
1 20.08.2017 B 9:05:11 1
2 20.08.2017 B 9:05:11 2
3 20.08.2017 B 9:10:00 1
4 20.08.2017 B 9:05:11 3
5 21.08.2017 A 8:03:05 1
6 21.08.2017 A 8:03:05 2
7 22.09.2017 A 8:05:00 1
8 22.09.2017 B 9:05:11 1
9 22.09.2017 B 9:05:11 2
10 22.09.2017 B 9:05:11 3
Upvotes: 4