Reputation: 241
I am miserably stuck at Pandas Data Cleaning. I have made simple example to demonstrate my problem. For each row, I want to delete the duplicates and keep the last one. Currently, my DataFrame is 'animals'. And I want it to be the DataFrame 'animals_clean'
Imagine this DataFrame. You can see duplicates along axis=0, e.g. 'cat' is repeated in row 0
list_of_animals = [['cat','dog','monkey','sparrow', 'cat'],['cow', 'eagle','rat', 'eagle', 'owl'],['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)
How it looks:
This is the result I want. You can see the duplicates in each row is marked 'X' keeping the last one.
list_of_animals_clean = [['X','dog','monkey','sparrow', 'cat'],['cow', 'X','rat', 'eagle', 'owl'], ['deer', 'horse', 'goat', 'X', 'falcon']]
animals_clean = pd.DataFrame(list_of_animals_clean)
Should look like:
Upvotes: 0
Views: 68
Reputation: 35676
Try apply + mask + duplicated with keep='last':
import pandas as pd
list_of_animals = [['cat', 'dog', 'monkey', 'sparrow', 'cat'],
['cow', 'eagle', 'rat', 'eagle', 'owl'],
['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)
animals = animals.apply(
lambda s: s.mask(s.duplicated(keep='last'), 'x'),
axis=1
)
print(animals)
Output:
0 1 2 3 4
0 x dog monkey sparrow cat
1 cow x rat eagle owl
2 deer horse goat x falcon
Upvotes: 2