bls
bls

Reputation: 361

Dropping consecutive duplicate rows, duplication only for 2 columns

I have the following example of a dataframe:

test = pd.DataFrame({'type': ['fruit', 'fruit', 'fruit', 'fruit', 'vegetable', 'vegetable', 'vegetable', 'vegetable'],
                    'name': ['orange', 'orange', 'apple', 'grape', 'leek', 'spinach', 'spinach', 'spinach'],
                    'value': ['100', '200', '300', '400', '500', '600', '700', '700']})

I want to remove consecutive duplicate rows that are only duplicates for:

test['type']

test['name']

I've seen that a common solution is to use shift if it was for the entire dataframe or an individual column

# for a particular column

test[test.name != test.name.shift()]

I there anyway to have it span 2 columns?

Thank you so much for any insight

Upvotes: 1

Views: 44

Answers (3)

wwnde
wwnde

Reputation: 26686

Another way but may not work if row is duplicated one after another is;

Set as multi index

 test.set_index(['type','name'], inplace=True)

Drop multiindex duplicates

test = test.loc[~test.index.duplicated(keep='first')]

Reset index

 test.reset_index(inplace=True)

Upvotes: 1

anky
anky

Reputation: 75150

2 Columns with an any on axis=1:

test[test[['type','name']].ne(test[['type','name']].shift()).any(1)]

        type     name value
0      fruit   orange   100
2      fruit    apple   300
3      fruit    grape   400
4  vegetable     leek   500
5  vegetable  spinach   600

Upvotes: 3

piRSquared
piRSquared

Reputation: 294536

Simple with tuple

s = pd.Series(zip(test.type, test.name), test.index)
test[s != s.shift()]

        type     name value
0      fruit   orange   100
2      fruit    apple   300
3      fruit    grape   400
4  vegetable     leek   500
5  vegetable  spinach   600

Upvotes: 5

Related Questions