letdatado
letdatado

Reputation: 241

Working through duplicates along rows in DataFrame and deleting all except the last one in Python Pandas

I am miserably stuck at Pandas Data Cleaning. I have made simple example to demonstrate my problem. For each row, I want to delete the duplicates and keep the last one. Currently, my DataFrame is 'animals'. And I want it to be the DataFrame 'animals_clean'

Imagine this DataFrame. You can see duplicates along axis=0, e.g. 'cat' is repeated in row 0

list_of_animals = [['cat','dog','monkey','sparrow', 'cat'],['cow', 'eagle','rat', 'eagle', 'owl'],['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)

How it looks:

Click here! This is how it looks

This is the result I want. You can see the duplicates in each row is marked 'X' keeping the last one.

list_of_animals_clean = [['X','dog','monkey','sparrow', 'cat'],['cow', 'X','rat', 'eagle', 'owl'], ['deer', 'horse', 'goat', 'X', 'falcon']]
animals_clean = pd.DataFrame(list_of_animals_clean)

Should look like:

Click here! This is how it should look like

Upvotes: 0

Views: 68

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35676

Try apply + mask + duplicated with keep='last':

import pandas as pd

list_of_animals = [['cat', 'dog', 'monkey', 'sparrow', 'cat'],
                   ['cow', 'eagle', 'rat', 'eagle', 'owl'],
                   ['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)

animals = animals.apply(
    lambda s: s.mask(s.duplicated(keep='last'), 'x'),
    axis=1
)

print(animals)

Output:

      0      1       2        3       4
0     x    dog  monkey  sparrow     cat
1   cow      x     rat    eagle     owl
2  deer  horse    goat        x  falcon

Upvotes: 2

Related Questions