Reputation: 113
I have a dataframe called Traffic:
Source Method Destination Weight
0 LA Ground NY 20
1 LA Ground NY 15
2 LA Ground LV 10
3 LA Air LV 5
I'd like to end up with a dataframe like so:
Source Ground Air
0 LA LV: 10, NY: 35 LV: 5
Where the 4 rows are combined into 1 based off the Source. The traffic methods are then further broken up by their destinations in ascending order. If there are multiple entries from say LA->NY of type Ground, add the weights.
Ground/Air columns would be strings following the format of "Destination:TotalWeight".
This pattern would continue all the way down for any other Sources such as 'WA'...
Upvotes: 0
Views: 98
Reputation: 26676
One way, not so 'concise' though.
Groupby
to sum weight. Create a column that dicts
destination and total weight per group. Another groupby
to put the dicts in list. Then pivot
.
g=df.groupby(['Source','Method','Destination']).agg(Weight=('Weight','sum')).reset_index()
g['dict']=g.Destination.str.cat(g.Weight.astype(str),sep=':')
k=g.groupby(['Source','Method'])['dict'].agg(list).apply(','.join).reset_index()
print(k.pivot(index='Source', columns='Method', values='dict').reset_index())
Gives:
Method Source Air Ground
0 LA LV:5 LV:10,NY:35
Upvotes: 2
Reputation: 326
initialize
df = pd.DataFrame(data= [["LA","Ground","NY",20],
["LA","Ground","NY",15],
["LA","Ground","LV",10],
["LA","Air","LV",5]],
columns=["Source","Method","Destination","Weight"])
do stuff
# add weights when same source, method & destination
df_ = df.groupby(["Source","Method","Destination"], as_index=False)['Weight'].sum()
# conatenate destination and weight strings
df_['Dest_Wt'] = df_['Destination'].str.cat(df_['Weight'].astype(str), sep = ': ')
shipping_summary = (df_.groupby(["Source","Method"])['Dest_Wt']
.apply(', '.join)
.unstack(level=1)
.reset_index())
shipping_summary.columns.name = None
print(shipping_summary)
output
Source Air Ground
0 LA LV: 5 LV: 10, NY: 35
Upvotes: 1
Reputation: 8768
This code should give you what you are looking for.
ndf = df.groupby(['Source','Method','Destination']).sum()
ndf = ndf.reset_index()
ndf['Weight'] = ndf['Weight'].astype(str)
ndf = ndf.groupby(['Source','Method']).agg({'Source':lambda x: x.max(),'Destination':
lambda x: ','.join(x),'Weight':lambda x: ','.join(x)})
fdf = ndf[['Destination','Weight']].apply(lambda x: x.str.split(',').explode())
fdf.reset_index(inplace=True)
fdf['Combined'] = fdf['Destination'] + ':' + fdf['Weight']
fdf = fdf[['Source','Method','Combined']]
fdf = pd.pivot_table(fdf,index='Source',columns = 'Method',aggfunc = lambda x: ',
'.join(x)).reset_index()
Upvotes: 1