erwachen
erwachen

Reputation: 21

Split with pandas dataframe column duplicate values into two dataframes, one with duplicate, one without duplicate

eg.

INPUT: one dataframe

   Name     id     Price
   Apple     01       13.86
   Cherry    02       13.24
   Banana    02       1.99
   Peach     03       14.76
   Orange    04       2.48

OUTPUT: two dataframes

one with with duplicate dataframe[id]:

   Name     id     Price
   Cherry    02       13.24
   Banana    02       1.99

other without duplicate dataframe[id]:

   Name     id     Price
   Apple     01       13.86
   Peach     03       14.76
   Orange    04       2.48

Many thanks

Upvotes: 0

Views: 1072

Answers (3)

SM1312
SM1312

Reputation: 588

noDuplicate = data.drop_duplicates('id', keep=False)
print("No Duplicates:", noDuplicate)

duplicate = data[data['id'].duplicated(keep=False)]
print("Duplicates:", duplicate)

Upvotes: 1

erwachen
erwachen

Reputation: 21

INPUT: df; OUTPUT: df_duplicated, df_unique

df_duplicated = df[df['id'].duplicated(keep=False)]
df_unique = pd.concat([df, df_duplicated]).drop_duplicates(keep=False)

print(df_duplicated)
print(df_unique)

Upvotes: 1

user7375116
user7375116

Reputation: 213

You can count the occurrence of each unique identifier and then merge the result on your dataframe to get the unique and duplicate values.

As an example:

df = pd.DataFrame(data={'Id': [1, 2, 2, 3, 4]})
agg_df = df.groupby(by='Id').agg(count=('Id', 'count'))
agg_df.reset_index(inplace=True)
filtered_df = agg_df.loc[agg_df['count'] == 1].merge(df, on=['Id'])
unique_df = agg_df.loc[agg_df['count'] > 1].merge(df, on=['Id'])

Upvotes: 0

Related Questions