Reputation: 213

pandas: Iterate over duplicate rows to check for unique values

Given a dataframe as follows:

 col1 col2
a  0  True 
b  0  True
c  1  True
d  1  False
e  2  False
f  2  False
g  3  True

For each unique value in col1, I would like to check if all values in col2 are matching, otherwise drop all rows for that corresponding value such that this would yield:

 col1 col2
a  0  True 
b  0  True
e  2  False
f  2  False
g  3  True

Upvotes: 1

Answers (3)

HelpfulHound

Reputation: 326

What have you tried? Seems like a reasonably straightforward problem. I'd use shape and drop_duplicates():

data=[
 col1 col2
a  0  True 
b  0  True
c  1  True
d  1  False
e  2  False
f  2  False
g  3  True
]

cols=["col1","col2"]
df=pd.DataFrame(data, columns=cols)

for ind, row in df.drop_duplicates(subset=["col1"])["col1"].iteritems():
    df1=df[df["col1"]==row]
    if df1.shape[0] == df1.drop_duplicates().shape[0]:
        #logic goes here

If the shapes match after dropping duplicates, they're different values. If not, then drop the whole subset and don't build the new dataframe with those rows.

Upvotes: -1

Quang Hoang

Reputation: 150785

You want nunique:

df[df.groupby('col1')['col2'].transform('nunique').eq(1)]

Output:

   col1   col2
a     0   True
b     0   True
e     2  False
f     2  False
g     3   True

Upvotes: 2

Valdi_Bo

Reputation: 31001

One of possible solutions: Group by col1 and filter each group checking whether all col2 values are True or all of them are False:

df.groupby('col1').filter(lambda x: x.col2.all() | (~x.col2).all())

Upvotes: 0

pandas: Iterate over duplicate rows to check for unique values

Answers (3)

Related Questions