Reputation: 310
I have this Pandas dataframe
image | user | answer |
---|---|---|
img_01 | 1 | 1 |
img_01 | 2 | 0 |
img_01 | 2 | 1 |
img_01 | 2 | 0 |
img_01 | 3 | 1 |
img_01 | 4 | 1 |
img_02 | 1 | 1 |
img_02 | ... | ... |
As you can see, user 2 gave 3 answers in total for img_01
, but not always the same. This happens throughout the dataset with different images and users. I know I can acquire the (image/user) combinations of these duplicates with
g = dataset.groupby('image')['user'].value_counts()
g = g[g > 1]
Now I want to replace his 3 answers with the majority vote among his answers or drop him entirely. How can I do that?
Upvotes: 0
Views: 96
Reputation: 2924
Feels like you may be over complicating it. I'd just summarise, then take the top one.
df.value_counts().reset_index().drop_duplicates(subset=['sample', 'user']).drop(columns = 0)
Let me know if that makes sense.
Upvotes: 1