bobo T
bobo T

Reputation: 157

Pandas: delete consecutive duplicates but keep the first and last value

So I have a set of values in a column that looks like this:

1 0 2 1 1 0 0 0 0 0 1 2 0 0 0 0 4 

I'm trying to delete the repeating zeros but keep the first and last ones. End result should look like this:

1 0 2 1 1 0 0 1 2 0 0 4

Drop duplicates won't work because it deletes all the zeros, not independent consecutive zeros.

df = df.loc[df.people.shift() != df.people]

this works well but does not save the last number of that consecutive one

enter image description here

Upvotes: 4

Views: 1604

Answers (2)

BENY
BENY

Reputation: 323326

Using fillna with limit

s[s.replace(0,np.nan).ffill(limit=1).bfill(limit=1).notnull()]
Out[387]: 
0     1
1     0
2     2
3     1
4     1
5     0
9     0
10    1
11    2
12    0
15    0
16    4
dtype: int64

Upvotes: 6

user3483203
user3483203

Reputation: 51165

Setup

s = pd.Series([1, 0, 2, 1, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 4])

You can make use of boolean indexing based on shift to check if an element is zero, and also check that it is in the middle of a group of zeros, while not being the first or last zero.

s[~((s==0) & (s == s.shift(1)) & (s == s.shift(-1)))]

Output:

0     1
1     0
2     2
3     1
4     1
5     0
9     0
10    1
11    2
12    0
15    0
16    4
dtype: int64

Upvotes: 2

Related Questions