thePandasFriend
thePandasFriend

Reputation: 113

How can I consolidate multiple rows into a single row based off their values in a Pandas Dataframe?

I have a dataframe called Traffic:

   Source   Method    Destination    Weight
0     LA    Ground             NY        20 
1     LA    Ground             NY        15
2     LA    Ground             LV        10
3     LA       Air             LV         5

I'd like to end up with a dataframe like so:

   Source            Ground        Air
0      LA    LV: 10, NY: 35      LV: 5

Where the 4 rows are combined into 1 based off the Source. The traffic methods are then further broken up by their destinations in ascending order. If there are multiple entries from say LA->NY of type Ground, add the weights.

Ground/Air columns would be strings following the format of "Destination:TotalWeight".

This pattern would continue all the way down for any other Sources such as 'WA'...

Upvotes: 0

Views: 98

Answers (3)

wwnde
wwnde

Reputation: 26676

One way, not so 'concise' though.

Groupby to sum weight. Create a column that dicts destination and total weight per group. Another groupby to put the dicts in list. Then pivot.

g=df.groupby(['Source','Method','Destination']).agg(Weight=('Weight','sum')).reset_index()
g['dict']=g.Destination.str.cat(g.Weight.astype(str),sep=':')
k=g.groupby(['Source','Method'])['dict'].agg(list).apply(','.join).reset_index()
print(k.pivot(index='Source', columns='Method', values='dict').reset_index())

Gives:

    Method Source   Air       Ground
0          LA       LV:5   LV:10,NY:35

Upvotes: 2

proteome
proteome

Reputation: 326

initialize

df = pd.DataFrame(data= [["LA","Ground","NY",20],
                         ["LA","Ground","NY",15],
                         ["LA","Ground","LV",10],
                         ["LA","Air","LV",5]],
                  columns=["Source","Method","Destination","Weight"])

do stuff

# add weights when same source, method & destination
df_ = df.groupby(["Source","Method","Destination"], as_index=False)['Weight'].sum()
# conatenate destination and weight strings
df_['Dest_Wt'] = df_['Destination'].str.cat(df_['Weight'].astype(str), sep = ': ')

shipping_summary = (df_.groupby(["Source","Method"])['Dest_Wt']
           .apply(', '.join)
           .unstack(level=1)
           .reset_index())
shipping_summary.columns.name = None

print(shipping_summary)

output

   Source      Air            Ground
0      LA    LV: 5    LV: 10, NY: 35

Upvotes: 1

rhug123
rhug123

Reputation: 8768

This code should give you what you are looking for.

ndf = df.groupby(['Source','Method','Destination']).sum()
ndf = ndf.reset_index()
ndf['Weight'] = ndf['Weight'].astype(str)
ndf = ndf.groupby(['Source','Method']).agg({'Source':lambda x: x.max(),'Destination': 
lambda x: ','.join(x),'Weight':lambda x: ','.join(x)})
fdf = ndf[['Destination','Weight']].apply(lambda x: x.str.split(',').explode())
fdf.reset_index(inplace=True)
fdf['Combined'] = fdf['Destination'] + ':' + fdf['Weight']
fdf = fdf[['Source','Method','Combined']]
fdf = pd.pivot_table(fdf,index='Source',columns = 'Method',aggfunc = lambda x: ', 
'.join(x)).reset_index()

Upvotes: 1

Related Questions