Reputation: 27
I have two dataframes:
df_id:
name id
region 01 850
region 01 15062
region 02 851
df_combination:
Origin destination total
region 01 region 01 1954
region 01 region 02 39
I have to perform all the possible combinations between region 01 and region 02 and divide the total between the total number of combinations.
Something like this output:
Origin_id Destination_id Total_division
850 850 488.5
850 15062 488.5
15062 850 488.5
15062 15062 488.5
850 851 19.5
15062 851 19.5
I have more than 300 regions, so I wonder if do it by python code (maybe loop) would be possible.
Upvotes: 0
Views: 88
Reputation: 11361
Here's another suggestion:
df = df_combination.merge(df_id, left_on='Origin', right_on='name')
df = df.merge(df_id, left_on='destination', right_on='name')
for _, group in df.groupby(['Origin', 'destination']):
df.loc[group.index, 'total'] /= group.shape[0]
df = df[['id_x', 'id_y', 'total']].rename(
columns={'id_x': 'Origin_id', 'id_y': 'Destination_id',
'total': 'Total_division'}
)
Result (print(df)
):
Origin_id Destination_id Total_division
0 850 850 488.5
1 850 15062 488.5
2 15062 850 488.5
3 15062 15062 488.5
4 850 851 19.5
5 15062 851 19.5
Upvotes: 0
Reputation: 2118
This code will give you exactly what you want and efficiently ;-)
It builds a dataframe for each row in df_combination and concatenate them all at the end.
import pandas as pd
from itertools import product
dict_region_to_ids = {reg: list(ids) for reg, ids in df_id.groupby("name")["id"]}
dfs = []
for r1, r2, total in df_combination.itertuples(index=False):
df = pd.DataFrame(product(dict_region_to_ids[r1], dict_region_to_ids[r2]),
columns=["Origin_id", "Destination_id"])
df["Total_division"] = total / len(df)
dfs.append(df)
df = pd.concat(dfs)
df
Upvotes: 1