python duplicated row in a group

I need a function, or a sentence, to identify duplicated rows in a group.

For example, if we have this dataframe, grouping by column ID 1, I need see if values of ID 2 are duplicated within its group, or not:

ID 1 ID 2
A 1
A 2
A 2
A 2
B 11
B 12

the desired output should be:

ID 1 ID 2 Duplicated_ID 2_in_its_group(ID 1)
A 1 false
A 2 true
A 2 true
A 2 true
B 11 false
B 12 false

Ideally, define a function. If not, a sentence to apply to the whole dataframe.

Thanks.

I tried a function with no result. Sorry.

Upvotes: 0

Views: 62

Answers (2)

Laszlo Hunyadi
Laszlo Hunyadi

Reputation: 1864

  1. Group the DataFrame by ID 1
  2. Check for duplicates within each group in ID 2using duplicated.
  3. Mark the duplicated value as True and the non-duplicated as False.

Implementation:

import pandas as pd
def identify_duplicates(df):
    df['Duplicated_ID_2_in_its_group'] = df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))
    return df
data = {
    'ID 1': ['A', 'A', 'A', 'A', 'B', 'B'],
    'ID 2': [1, 2, 2, 2, 11, 12]
}
df = pd.DataFrame(data)
result_df = identify_duplicates(df)
print(result_df)

Upvotes: 1

Ömer Sezer
Ömer Sezer

Reputation: 2669

Please use:

df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))

By setting keep on False, all duplicates are True.

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

Sample Code:

import pandas as pd

data = {'ID 1': ['A', 'A', 'A', 'A', 'B', 'B'],
        'ID 2': [1, 2, 2, 2, 11, 12]}
df = pd.DataFrame(data)
df['Duplicated_ID_2_in_its_group(ID_1)'] = df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))

print(df)

Output:

  ID 1  ID 2  Duplicated_ID_2_in_its_group(ID_1)
0    A     1                               False
1    A     2                                True
2    A     2                                True
3    A     2                                True
4    B    11                               False
5    B    12                               False

Upvotes: 2

Related Questions