Reputation: 766
I have the following dataframe, from which I need to drop consecutive duplicate values only if they equal 0.3 or 0.4.
In [2]: df = pd.DataFrame(index=pd.date_range('20020101', periods=7, freq='D'),
data={'poll_support': [0.3, 0.4, 0.4, 0.4, 0.3 0.5 0.5]})
In [3]: df
Out[3]:
poll_support
2002-01-01 0.3
2002-01-02 0.4
2002-01-03 0.4
2002-01-04 0.4
2002-01-05 0.3
2002-01-06 0.5
2002-01-07 0.5
I need the df to look like this:
2002-01-01 0.3
2002-01-02 0.4
2002-01-05 0.3
2002-01-06 0.5
2002-01-07 0.5
I tried:
for var in df['poll_support']:
if var == 0.3 or var == 0.4:
df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.3]
df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.4]
However, this does not produce the desired df.
I would love to hear suggestions.
Upvotes: 1
Views: 42
Reputation: 26676
Boolean indexing will help. Try:
df[~((df['poll_support']==df['poll_support'].shift())&(df['poll_support'].isin([0.3,0.4])))]
poll_support
2002-01-01 0.3
2002-01-02 0.4
2002-01-05 0.3
2002-01-06 0.5
2002-01-07 0.5
Upvotes: 1