Reputation: 361
I have the following example of a dataframe:
test = pd.DataFrame({'type': ['fruit', 'fruit', 'fruit', 'fruit', 'vegetable', 'vegetable', 'vegetable', 'vegetable'],
'name': ['orange', 'orange', 'apple', 'grape', 'leek', 'spinach', 'spinach', 'spinach'],
'value': ['100', '200', '300', '400', '500', '600', '700', '700']})
I want to remove consecutive duplicate rows that are only duplicates for:
test['type']
test['name']
I've seen that a common solution is to use shift if it was for the entire dataframe or an individual column
# for a particular column
test[test.name != test.name.shift()]
I there anyway to have it span 2 columns?
Thank you so much for any insight
Upvotes: 1
Views: 44
Reputation: 26686
Another way but may not work if row is duplicated one after another is;
Set as multi index
test.set_index(['type','name'], inplace=True)
Drop multiindex duplicates
test = test.loc[~test.index.duplicated(keep='first')]
Reset index
test.reset_index(inplace=True)
Upvotes: 1
Reputation: 75150
2 Columns with an any
on axis=1
:
test[test[['type','name']].ne(test[['type','name']].shift()).any(1)]
type name value
0 fruit orange 100
2 fruit apple 300
3 fruit grape 400
4 vegetable leek 500
5 vegetable spinach 600
Upvotes: 3
Reputation: 294536
tuple
s = pd.Series(zip(test.type, test.name), test.index)
test[s != s.shift()]
type name value
0 fruit orange 100
2 fruit apple 300
3 fruit grape 400
4 vegetable leek 500
5 vegetable spinach 600
Upvotes: 5