Reputation: 985
(This is the continuation of my previous question)
I now have two table: df1 (showing products and their classification per day):
ABC CDE FGH
4/12/2021 S1 S1 S1
4/13/2021 S1 S1 S3
4/14/2021 S1 S2 S2
4/15/2021 S3 S2 S3
and df2 (Showing table and the quantity in inventory per day)
ABC CDE FGH
4/12/2021 94 19 101
4/13/2021 93 19 102
4/14/2021 94 18 103
4/15/2021 90 17 110
I am looking top build a df3 table that sum the the quantities per classification so that it looks like:
PS1 PS2 PS3
4/12/2021 214 0 0
4/13/2021 112 0 102
4/14/2021 94 121 0
4/15/2021 0 17 200
The simple yet highly inefficient approach that I did again here is to loop through every product in df1 columns, and then loop through every date, then select the value and do the same with classification table, then row by row build df3. This process, if run every day takes about 30 minutes.
Is there a python way of doing please?
One challenge here is that, there can be more columns in df1 or df2, we cannot use column labels in the process. That is the another difficulty.
Upvotes: 2
Views: 94
Reputation: 14949
Let's merge the 2 dataframe 1st. I've renamed the date columns in both the dataframes as the index.
merged_df = df1.merge(df2, how='left',on ='index')
print(merged_df)
Output -
index | ABC_x | CDE_x | FGH_x | ABC_y | CDE_y | FGH_y | |
---|---|---|---|---|---|---|---|
0 | 4/12/2021 | S1 | S1 | S1 | 94 | 19 | 101 |
1 | 4/13/2021 | S1 | S1 | S3 | 93 | 19 | 102 |
2 | 4/14/2021 | S1 | S2 | S2 | 94 | 18 | 103 |
3 | 4/15/2021 | S3 | S2 | S3 | 90 | 17 | 110 |
Now, we have the merged dataset and we can do transformation row by row by apply method. I've created one transform method to do the task -
def transform_values(x):
# print(x['FGH_x'])
dict1 = {'S1': 'ABC_y', 'S2':'CDE_y', 'S3': 'FGH_y'}
if x['ABC_x'] != 'S1':
# print(x['ABC_x'])
x[dict1[x['ABC_x']]] = x[dict1[x['ABC_x']]] + x['ABC_y']
x['ABC_y'] = 0
if x['CDE_x'] != 'S2':
# print(x['CDE_x'])
x[dict1[x['CDE_x']]] = x[dict1[x['CDE_x']]] + x['CDE_y']
x['CDE_y'] = 0
if x['FGH_x'] != 'S3':
print(x[dict1[x['FGH_x']]])
x[dict1[x['FGH_x']]] = x[dict1[x['FGH_x']]] + x['FGH_y']
x['FGH_y'] = 0
return x
final_df = merged_df.apply(lambda x : transform_values(x), axis=1)
print(final_df)
Output -
index ABC_x CDE_x FGH_x ABC_y CDE_y FGH_y
0 4/12/2021 S1 S1 S1 214 0 0
1 4/13/2021 S1 S1 S3 112 0 102
2 4/14/2021 S1 S2 S2 94 121 0
3 4/15/2021 S3 S2 S3 0 17 200
Drop the useless columns to get desired output
Upvotes: 1