arkadiy
arkadiy

Reputation: 766

Drop only specific consequtive duplicates in a pandas dataframe

I have the following dataframe, from which I need to drop consecutive duplicate values only if they equal 0.3 or 0.4.

In [2]: df = pd.DataFrame(index=pd.date_range('20020101', periods=7, freq='D'),
                              data={'poll_support': [0.3, 0.4, 0.4, 0.4, 0.3 0.5 0.5]})
    
In [3]: df
Out[3]:
                poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-03           0.4
2002-01-04           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I need the df to look like this:

2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I tried:

for var in df['poll_support']:
    if var == 0.3 or var == 0.4:
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.3]
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.4]

However, this does not produce the desired df.

I would love to hear suggestions.

Upvotes: 1

Views: 42

Answers (1)

wwnde
wwnde

Reputation: 26676

Boolean indexing will help. Try:

df[~((df['poll_support']==df['poll_support'].shift())&(df['poll_support'].isin([0.3,0.4])))]




             poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

Upvotes: 1

Related Questions