Reputation: 3793
d
is holding data about people's planned and actual work duration in hours in either owner or manager role.
Note that one person can - but does not have to - be an owner and manager at the same time.
I need to rearrange d
in a way that gives me all the names in one column and in the other columns
the planned and actual hours in each role.
The code below achieves that, but is not very pretty.
How can I utilize some native Pandas features to get the same result, but with less typing?
import pandas as pd
d = {
'owner': ['mike', 'john', 'jake', 'lucy', 'mary', 'hans'],
'owner planned': [54, 67, 52, 19, 87, 45],
'owner actual': [12, 54, 3, 67, 84, 22],
'manager': ['andrew', 'tom', 'john', 'mike', 'hans', 'paul'],
'manager planned': [13, 432, 453, 765, 432, 234],
'manager actual': [22, 33, 44, 55, 66, 77],
}
df = pd.DataFrame(d)
names = list(set(df['owner'].tolist() + df['manager'].tolist()))
output = {}
for name in names:
op = df[df['owner'] == name]['owner planned'].sum()
oa = df[df['owner'] == name]['owner actual'].sum()
mp = df[df['manager'] == name]['manager planned'].sum()
ma = df[df['manager'] == name]['manager actual'].sum()
output.setdefault('owner_planned', []).append(op)
output.setdefault('owner_actual', []).append(oa)
output.setdefault('manager_planned', []).append(mp)
output.setdefault('manager_actual', []).append(ma)
output.setdefault('names', []).append(name)
df2 = pd.DataFrame(output)
print(df2)
Upvotes: 1
Views: 47
Reputation: 402814
Use filter
, concat
, and DataFrameGroupBy.sum
:
u = df.filter(like='owner').rename({'owner':'names'}, axis=1)
v = df.filter(like='manager').rename({'manager':'names'}, axis=1)
pd.concat([u,v], sort=False).groupby('names').sum()
owner planned owner actual manager planned manager actual
names
andrew 0.0 0.0 13.0 22.0
hans 45.0 22.0 432.0 66.0
jake 52.0 3.0 0.0 0.0
john 67.0 54.0 453.0 44.0
lucy 19.0 67.0 0.0 0.0
mary 87.0 84.0 0.0 0.0
mike 54.0 12.0 765.0 55.0
paul 0.0 0.0 234.0 77.0
tom 0.0 0.0 432.0 33.0
Upvotes: 1
Reputation: 9941
z = pd.concat([df.iloc[:,:3], df.iloc[:,3:]], sort=True)
z['name'] = z[['owner', 'manager']].mode(1)[0]
z.groupby('name').sum()
Out:
manager actual manager planned owner actual owner planned
name
andrew 22.0 13.0 0.0 0.0
hans 66.0 432.0 22.0 45.0
jake 0.0 0.0 3.0 52.0
john 44.0 453.0 54.0 67.0
lucy 0.0 0.0 67.0 19.0
mary 0.0 0.0 84.0 87.0
mike 55.0 765.0 12.0 54.0
paul 77.0 234.0 0.0 0.0
tom 33.0 432.0 0.0 0.0
Upvotes: 1