Finding duplicate rows python

Question

I have timestamp and id variables in my dataframe (df)

timestamp                    id
2016-06-09 8:33:37           a1  
2016-06-09 8:33:37           a1  
2016-06-09 8:33:38           a1
2016-06-09 8:33:39           a1
2016-06-09 8:33:39           a1
2016-06-09 8:33:37           b1
2016-06-09 8:33:38           b1

Each id can't have two timestamps. I have to print these duplicate timestamps for each id. In my above case, the output should be for rows 1,2,4,5

The following code will give the duplicate timestamp

set([x for x in df['timestamp'] if df['timestamp'].count(x) > 1])

How to consider id along with timestamp to have the duplicate rows?

Little Bobby Tables · Accepted Answer

If you wish to find all duplicates then use the duplicated method. It only works on the columns. On the other hand df.index.duplicated works on the index. Therefore we do a quick reset_index to bring the index into the columns.

df = df.reset_index()
df.loc[df.duplicated(keep=False)]

    index               id
0   2016-06-09 8:33:37  a1
1   2016-06-09 8:33:37  a1
3   2016-06-09 8:33:39  a1
4   2016-06-09 8:33:39  a1

If you just wish to remove duplicates there is the DataFrame method drop_duplicates.

df = df.reset_index()
df = df.drop_duplicates() # keep='first' by default.

    index               id
0   2016-06-09 8:33:37  a1
2   2016-06-09 8:33:38  a1
3   2016-06-09 8:33:39  a1
5   2016-06-09 8:33:37  b1
6   2016-06-09 8:33:38  b1

If you wish to get back your old index then set_index using the default columns name 'index' for either of the above then simply use.

df.set_index('index')

                    id
index               
2016-06-09 8:33:37  a1
2016-06-09 8:33:38  a1
2016-06-09 8:33:39  a1
2016-06-09 8:33:37  b1
2016-06-09 8:33:38  b1

The above methods allow you choose whether to keep the first, last or none of the duplicates by setting the keep attribute to 'first', 'last' or False. So to remove all duplicates in df then use keep=False.

Finding duplicate rows python

Answers (2)

Related Questions