Rene Chan
Rene Chan

Reputation: 985

Aggregating two dataframe in Python with lookup function

(This is the continuation of my previous question)

I now have two table: df1 (showing products and their classification per day):

            ABC CDE FGH
4/12/2021   S1  S1  S1
4/13/2021   S1  S1  S3
4/14/2021   S1  S2  S2
4/15/2021   S3  S2  S3

and df2 (Showing table and the quantity in inventory per day)

            ABC CDE FGH
4/12/2021   94  19  101
4/13/2021   93  19  102
4/14/2021   94  18  103
4/15/2021   90  17  110

I am looking top build a df3 table that sum the the quantities per classification so that it looks like:

            PS1 PS2 PS3
4/12/2021   214 0   0
4/13/2021   112 0   102
4/14/2021   94  121 0
4/15/2021   0   17  200

The simple yet highly inefficient approach that I did again here is to loop through every product in df1 columns, and then loop through every date, then select the value and do the same with classification table, then row by row build df3. This process, if run every day takes about 30 minutes.

Is there a python way of doing please?

One challenge here is that, there can be more columns in df1 or df2, we cannot use column labels in the process. That is the another difficulty.

Upvotes: 2

Views: 94

Answers (1)

Nk03
Nk03

Reputation: 14949

Let's merge the 2 dataframe 1st. I've renamed the date columns in both the dataframes as the index.

merged_df = df1.merge(df2, how='left',on ='index')
print(merged_df)

Output -

index ABC_x CDE_x FGH_x ABC_y CDE_y FGH_y
0 4/12/2021 S1 S1 S1 94 19 101
1 4/13/2021 S1 S1 S3 93 19 102
2 4/14/2021 S1 S2 S2 94 18 103
3 4/15/2021 S3 S2 S3 90 17 110

Now, we have the merged dataset and we can do transformation row by row by apply method. I've created one transform method to do the task -


def transform_values(x):
    # print(x['FGH_x'])
    dict1 = {'S1': 'ABC_y', 'S2':'CDE_y', 'S3': 'FGH_y'}
    if x['ABC_x'] != 'S1':
        # print(x['ABC_x'])
        x[dict1[x['ABC_x']]] = x[dict1[x['ABC_x']]] + x['ABC_y']
        x['ABC_y'] = 0
     
    if x['CDE_x'] != 'S2':
        # print(x['CDE_x'])
        x[dict1[x['CDE_x']]] = x[dict1[x['CDE_x']]] + x['CDE_y']
        x['CDE_y'] = 0
        
    if x['FGH_x'] != 'S3':
        print(x[dict1[x['FGH_x']]])
        x[dict1[x['FGH_x']]] = x[dict1[x['FGH_x']]] + x['FGH_y']
        x['FGH_y'] = 0
    
    return x
        
final_df =  merged_df.apply(lambda x : transform_values(x), axis=1)

print(final_df)

Output -


       index ABC_x CDE_x FGH_x  ABC_y  CDE_y  FGH_y
0  4/12/2021    S1    S1    S1    214      0      0
1  4/13/2021    S1    S1    S3    112      0    102
2  4/14/2021    S1    S2    S2     94    121      0
3  4/15/2021    S3    S2    S3      0     17    200

Drop the useless columns to get desired output

Upvotes: 1

Related Questions