pandas remove records conditionally based on records count of groups

Question

I have a dataframe like this

import pandas as pd
import numpy as np

raw_data = {'Country':['UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK','UK'],
    'Product':['A','A','A','A','B','B','B','B','B','B','B','B','C','C','C','D','D','D','D','D','D'],
            'Week': [1,2,3,4,1,2,3,4,5,6,7,8,1,2,3,1,2,3,4,5,6], 
       'val': [5,4,3,1,5,6,7,8,9,10,11,12,5,5,5,5,6,7,8,9,10]
    }

df2 = pd.DataFrame(raw_data, columns = ['Country','Product','Week', 'val'])

print(df2)

and mapping dataframe

mapping = pd.DataFrame({'Product':['A','C'],'Product1':['B','D']}, columns = ['Product','Product1'])

and i wanted to compare products as per mapping. product A data should match with product B data.. the logic is product A number of records is 4 so product B records also should be 4 and those 4 records should be from the week number before and after form last week number of product A and including the last week number. so before 1 week of week number 4 i.e. 3rd week and after 2 weeks of week number 4 i.e 5,6 and week 4 data.

similarly product C number of records is 3 so product D records also should be 3 and those records before and after last week number of product C. so product c last week number 3 so product D records will be week number 2,3,4.

wanted data frame will be like below i wanted to remove those yellow records

Valdi_Bo · Accepted Answer

Define the following function selecting rows from df, for products from the current row in mapping:

def selRows(row, df):
    rows_1 = df[df.Product == row.Product]
    nr_1 = rows_1.index.size
    lastWk_1 = rows_1.Week.iat[-1]
    rows_2 = df[df.Product.eq(row.Product1) & df.Week.ge(lastWk_1 - 1)].iloc[:nr_1]
    return pd.concat([rows_1, rows_2])

Then call it the following way:

result = pd.concat([ selRows(row, grp)
    for _, grp in df2.groupby(['Country'])
        for _, row in mapping.iterrows() ])

The list comprehension above creates a list on DataFrames - results of calls of selRows on:

each group of rows from df2, for consecutive countries (the outer loop),
each row from mapping (the inner loop).

Then concat concatenates all of them into a single DataFrame.

pandas remove records conditionally based on records count of groups

Answers (2)

Related Questions