Julian Aßmann
Julian Aßmann

Reputation: 310

Replace duplicates in a group with the majority vote in Pandas

I have this Pandas dataframe

image user answer
img_01 1 1
img_01 2 0
img_01 2 1
img_01 2 0
img_01 3 1
img_01 4 1
img_02 1 1
img_02 ... ...

As you can see, user 2 gave 3 answers in total for img_01, but not always the same. This happens throughout the dataset with different images and users. I know I can acquire the (image/user) combinations of these duplicates with

g = dataset.groupby('image')['user'].value_counts()
g = g[g > 1]

Now I want to replace his 3 answers with the majority vote among his answers or drop him entirely. How can I do that?

Upvotes: 0

Views: 96

Answers (1)

Quixotic22
Quixotic22

Reputation: 2924

Feels like you may be over complicating it. I'd just summarise, then take the top one.

df.value_counts().reset_index().drop_duplicates(subset=['sample', 'user']).drop(columns = 0)

Let me know if that makes sense.

Upvotes: 1

Related Questions