barciewicz
barciewicz

Reputation: 3793

Summing up two tables by values common in one of their columns

d is holding data about people's planned and actual work duration in hours in either owner or manager role. Note that one person can - but does not have to - be an owner and manager at the same time.

I need to rearrange d in a way that gives me all the names in one column and in the other columns the planned and actual hours in each role.

The code below achieves that, but is not very pretty.

How can I utilize some native Pandas features to get the same result, but with less typing?

import pandas as pd

d = {
    'owner': ['mike', 'john', 'jake', 'lucy', 'mary', 'hans'],
    'owner planned': [54, 67, 52, 19, 87, 45],
    'owner actual': [12, 54, 3, 67, 84, 22],
    'manager': ['andrew', 'tom', 'john', 'mike', 'hans', 'paul'],
    'manager planned': [13, 432, 453, 765, 432, 234], 
    'manager actual': [22, 33, 44, 55, 66, 77],
}

df = pd.DataFrame(d)
names = list(set(df['owner'].tolist() + df['manager'].tolist()))
output = {}

for name in names:
    op = df[df['owner'] == name]['owner planned'].sum()
    oa = df[df['owner'] == name]['owner actual'].sum()
    mp = df[df['manager'] == name]['manager planned'].sum()
    ma = df[df['manager'] == name]['manager actual'].sum()

    output.setdefault('owner_planned', []).append(op)
    output.setdefault('owner_actual', []).append(oa)
    output.setdefault('manager_planned', []).append(mp)
    output.setdefault('manager_actual', []).append(ma)
    output.setdefault('names', []).append(name)

df2 = pd.DataFrame(output)
print(df2)

Upvotes: 1

Views: 47

Answers (2)

cs95
cs95

Reputation: 402814

Use filter, concat, and DataFrameGroupBy.sum:

u = df.filter(like='owner').rename({'owner':'names'}, axis=1)
v = df.filter(like='manager').rename({'manager':'names'}, axis=1)

pd.concat([u,v], sort=False).groupby('names').sum()

        owner planned  owner actual  manager planned  manager actual
names                                                               
andrew            0.0           0.0             13.0            22.0
hans             45.0          22.0            432.0            66.0
jake             52.0           3.0              0.0             0.0
john             67.0          54.0            453.0            44.0
lucy             19.0          67.0              0.0             0.0
mary             87.0          84.0              0.0             0.0
mike             54.0          12.0            765.0            55.0
paul              0.0           0.0            234.0            77.0
tom               0.0           0.0            432.0            33.0

Upvotes: 1

perl
perl

Reputation: 9941

z = pd.concat([df.iloc[:,:3], df.iloc[:,3:]], sort=True)
z['name'] = z[['owner', 'manager']].mode(1)[0]
z.groupby('name').sum()

Out:

    manager actual  manager planned owner actual    owner planned
name                
andrew  22.0    13.0    0.0 0.0
hans    66.0    432.0   22.0    45.0
jake    0.0 0.0 3.0 52.0
john    44.0    453.0   54.0    67.0
lucy    0.0 0.0 67.0    19.0
mary    0.0 0.0 84.0    87.0
mike    55.0    765.0   12.0    54.0
paul    77.0    234.0   0.0 0.0
tom 33.0    432.0   0.0 0.0

Upvotes: 1

Related Questions