Reputation: 5823
Let's say I have the following series.
s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])
I can keep the first duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='first')]
I can keep the last duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='last')]
However, I'm looking to do the following.
3
, but keep the other 3's
. Keep all other remaining duplicates.3
, but drop all other 3's
. Keep all other remaining duplicates.I've been racking my brain using cumsum()
and diff()
to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...
Upvotes: 5
Views: 3946
Reputation: 13955
ind = s[s.duplicated()].index[0]
gives you the first index where a record is duplicated. Use it to drop.
In [45]: s.drop(ind)
Out[45]:
0 0
1 1
2 2
4 3
5 3
6 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
For part 2, there must be a neat solution, but the only one I can think of is to use create a series of bools to indicate where the index does not equal ind and the value at the index does equal the ind value and then use np.logical_xor:
s[np.logical_xor(s.index != ind, s==s.iloc[ind])]
Out[95]:
0 0
1 1
2 2
4 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
Upvotes: 6
Reputation: 294338
duplicated
to get dups after the first oneduplicated(keep=False)
to get all dups including first onexor
or ^
to find where it's just the first dup6
as wells[~(s.duplicated(keep=False) ^ s.duplicated())]
0 0
1 1
2 2
4 3
5 3
6 3
7 4
8 5
10 6
11 6
13 7
dtype: int64
Upvotes: 4