IcemanBerlin
IcemanBerlin

Reputation: 3437

Python Pandas: Eliminate a row from a dataframe if a value in a any preceding row in a groupby meets a certain criteria

I am trying to remove data from a groupby once the Week becomes non-sequential by more than 1. i.e. If there is a gap in a week then i want to remove that row and subsequent rows in that group by. below is a simple example of the sort of structure of data I have and also the ideal output I am looking for. The data needs to be grouped by Country and Product.

import pandas as pd

data = {'Country' : ['US','US','US','US','US','DE','DE','DE','DE','DE'],'Product' :         ['Coke','Coke','Coke','Coke','Coke','Apple','Apple','Apple','Apple','Apple'],'Week' : [1,2,3,4,6,1,2,3,5,6] }

df = pd.DataFrame(data)

print df

#Current starting Dataframe.
  Country Product  Week
0      US    Coke     1
1      US    Coke     2
2      US    Coke     3
3      US    Coke     4
4      US    Coke     6
5      DE   Apple     1
6      DE   Apple     2
7      DE   Apple     3
8      DE   Apple     5
9      DE   Apple     6

#Ideal Output below:
  Country Product  Week
0      US    Coke     1
1      US    Coke     2
2      US    Coke     3
3      US    Coke     4
5      DE   Apple     1
6      DE   Apple     2
7      DE   Apple     3

So the output removes Week 6 for the US Coke because the preceding week was 4. For DE Apple Week 5 & 6 was removed because the preceding Week to Week 5 was 3. note this also eliminates DE Apple Week 6 even though its preceding is 5 or a diff() of 1.

Upvotes: 1

Views: 184

Answers (2)

user1827356
user1827356

Reputation: 7022

This should work

df.groupby(['Country', 'Product']).apply(lambda sdf: sdf[(sdf.Week.diff(1).fillna(1) != 1).astype('int').cumsum() == 0]).reset_index(drop=True)

Another method, that might be more readable (i.e. generate a set of consecutive weeks and check against the observed week)

df['expected_week'] = df.groupby(['Country', 'Product']).Week.transform(lambda s: range(s.min(), s.min() + s.size))
df[df.Week == df.expected_week]

Upvotes: 1

ZJS
ZJS

Reputation: 4051

You could try out this method...

def eliminate(x):
    x['g'] = x['Week'] - np.arange(x.shape[0])
    x = x[x['g'] == x['g'].min()]
    x = x.drop('g',1)

    return x

out  = df.groupby('Product').apply(eliminate).reset_index(level=0,drop=True)

Upvotes: 1

Related Questions