I need a function, or a sentence, to identify duplicated rows in a group. For example, if we have this dataframe, grouping by column ID 1, I need see if values of ID 2 are duplicated within its group, or not: ID 1 ID 2 A 1 A 2 A 2 A 2 B 11 B 12 the desired output should be: ID 1 ID 2 Duplicated_ID 2_in_its_group(ID 1) A 1 false A 2 true A 2 true A 2 true B 11 false B 12 false Ideally, define a function. If not, a sentence to apply to the whole dataframe. Thanks. I tried a function with no result. Sorry.

Reputation: 29

python duplicated row in a group

I need a function, or a sentence, to identify duplicated rows in a group.

For example, if we have this dataframe, grouping by column ID 1, I need see if values of ID 2 are duplicated within its group, or not:

ID 1	ID 2
A	1
A	2
A	2
A	2
B	11
B	12

the desired output should be:

ID 1	ID 2	Duplicated_ID 2_in_its_group(ID 1)
A	1	false
A	2	true
A	2	true
A	2	true
B	11	false
B	12	false

Ideally, define a function. If not, a sentence to apply to the whole dataframe.

Thanks.

I tried a function with no result. Sorry.

Upvotes: 0

Answers (2)

Laszlo Hunyadi

Reputation: 1864

Group the DataFrame by ID 1
Check for duplicates within each group in ID 2using duplicated.
Mark the duplicated value as True and the non-duplicated as False.

Implementation:

import pandas as pd
def identify_duplicates(df):
    df['Duplicated_ID_2_in_its_group'] = df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))
    return df
data = {
    'ID 1': ['A', 'A', 'A', 'A', 'B', 'B'],
    'ID 2': [1, 2, 2, 2, 11, 12]
}
df = pd.DataFrame(data)
result_df = identify_duplicates(df)
print(result_df)

Upvotes: 1

Ömer Sezer

Reputation: 2669

Please use:

df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))

By setting keep on False, all duplicates are True.

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

Sample Code:

import pandas as pd

data = {'ID 1': ['A', 'A', 'A', 'A', 'B', 'B'],
        'ID 2': [1, 2, 2, 2, 11, 12]}
df = pd.DataFrame(data)
df['Duplicated_ID_2_in_its_group(ID_1)'] = df.groupby('ID 1')['ID 2'].transform(lambda x: x.duplicated(keep=False))

print(df)

Output:

  ID 1  ID 2  Duplicated_ID_2_in_its_group(ID_1)
0    A     1                               False
1    A     2                                True
2    A     2                                True
3    A     2                                True
4    B    11                               False
5    B    12                               False

Upvotes: 2

python duplicated row in a group

Answers (2)

Related Questions