Finding and counting duplicates based on other columns

Question

I have following table in Pandas:

Date    System  Action
20.08.2017  A   9:01:01
20.08.2017  B   9:05:11
20.08.2017  B   9:05:11
20.08.2017  B   9:10:00
20.08.2017  B   9:05:11
21.08.2017  A   8:03:05
21.08.2017  A   8:03:05
22.09.2017  A   8:05:00
22.09.2017  B   9:05:11
22.09.2017  B   9:05:11
22.09.2017  B   9:05:11

I would like to find duplicities in “Action time” but only for the same system and date. And than mark them to Seq column. Thus results should look like this:

Date    System  Action  Seq
20.08.2017  A   9:01:01 1
20.08.2017  B   9:05:11 1
20.08.2017  B   9:05:11 2
20.08.2017  B   9:10:00 1
20.08.2017  B   9:05:11 3
21.08.2017  A   8:03:05 1
21.08.2017  A   8:03:05 2
22.09.2017  A   8:05:00 1
22.09.2017  B   9:05:11 1
22.09.2017  B   9:05:11 2
22.09.2017  B   9:05:11 3

The first occurrence of time for system “A” in new day has “1”. All following entries for the same day and same system have +1. The counter is separate for all days and all systems.

Can you please hint me with some pythonic way how to do this? All I can think of is a loop over all data. But the data set is quite large…

cs95 · Accepted Answer

This looks like a job for df.groupby followed by dfGroupBy.cumcount:

In [1018]: df.assign(Seq=df.groupby(['Date', 'System', 'Action']).cumcount() + 1)
Out[1018]: 
          Date System   Action  Seq
0   20.08.2017      A  9:01:01    1
1   20.08.2017      B  9:05:11    1
2   20.08.2017      B  9:05:11    2
3   20.08.2017      B  9:10:00    1
4   20.08.2017      B  9:05:11    3
5   21.08.2017      A  8:03:05    1
6   21.08.2017      A  8:03:05    2
7   22.09.2017      A  8:05:00    1
8   22.09.2017      B  9:05:11    1
9   22.09.2017      B  9:05:11    2
10  22.09.2017      B  9:05:11    3

Finding and counting duplicates based on other columns

Answers (1)

Related Questions