saifhassan
saifhassan

Reputation: 386

How to retrieve rows based on duplicate value in specific column in Pandas Python?

Let's Say we have data as follows:

 A       B
123     John
456     Mary
102     Allen
456     Nickolan
123     Richie    
167     Daniel

We want to get retrieve rows based on column A if duplicated then store in different dataframes with that code name.

[123  John, 123  Richie], These both will be stored in df_123
[456 Mary, 456 Nickolan], These both will be stored in df_456
[102 Allen] will be stored in df_102
[167 Daniel] will be stored in df_167

Thanks in Advance

Upvotes: 1

Views: 363

Answers (2)

jpp
jpp

Reputation: 164833

groupby + tuple + dict

Creating a variable number of variables is not recommended. You can use a dictionary:

dfs = dict(tuple(df.groupby('A')))

And that's it. To access the dataframe where A == 123, use dfs[123], etc.

Note your dataframes are now distinct objects. You can no longer perform operations on dfs and have them applied to each dataframe value without a Python-level loop.

Upvotes: 2

It_is_Chris
It_is_Chris

Reputation: 14113

group and then use list comprehension, which will return a list of dataframes based on the group:

group = df.groupby('A')
dfs = [group.get_group(x) for x in group.groups]

[     A       B
 2  112   Allen
 5  112  Daniel,      A       B
 0  123    John
 4  123  Richie,      A         B
 1  456      Mary
 3  456  Nickolan]

Upvotes: 2

Related Questions