Pandas Drop Very First Duplicate only

Question

Let's say I have the following series.

s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])

I can keep the first duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='first')]

I can keep the last duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='last')]

However, I'm looking to do the following.

Drop only the very first duplicate, keep the other duplicates of that matching value, but also keep all other duplicates of varying values (including the first ones of each group). In the example above, we'd drop the first 3, but keep the other 3's. Keep all other remaining duplicates.
Keep the first duplicate, drop the duplicates that matching value, but also keep all the other duplicates of other varying values. In the example above, we'd keep the first 3, but drop all other 3's. Keep all other remaining duplicates.

I've been racking my brain using cumsum() and diff() to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...

Woody Pride · Accepted Answer

ind = s[s.duplicated()].index[0]

gives you the first index where a record is duplicated. Use it to drop.

In [45]: s.drop(ind)
Out[45]:
0     0
1     1
2     2
4     3
5     3
6     3
7     4
8     5
9     6
10    6
11    6
12    7
13    7
dtype: int64

For part 2, there must be a neat solution, but the only one I can think of is to use create a series of bools to indicate where the index does not equal ind and the value at the index does equal the ind value and then use np.logical_xor:

s[np.logical_xor(s.index != ind, s==s.iloc[ind])]

Out[95]:
0     0
1     1
2     2
4     3
7     4
8     5
9     6
10    6
11    6
12    7
13    7
dtype: int64

Pandas Drop Very First Duplicate only

Answers (2)

Related Questions