Reputation: 305
I have two pandas dataframes. First have data how many hours person spent doing what that day. Second dataframe is Sum of the day and the hours the person actually were in work. Examples here:
df_1:
| name | id | description | hours_worked | date |
| ---- | -- | -------------- | ------------ | ---------- |
| John | 1 | implementation | 5 | 2020-11-16 |
| John | 1 | refactor | 3 | 2020-11-16 |
| John | 1 | implementation | 8 | 2020-11-17 |
| Paul | 2 | refactor | 4 | 2020-11-16 |
df_2:
| name | id | hours_charged | hours_worked | date |
| ---- | -- | -------------- | ------------ | ---------- |
| John | 1 | 8 | 8 | 2020-11-16 |
| John | 1 | 8 | 8 | 2020-11-17 |
| Paul | 2 | 8 | 4 | 2020-11-16 |
Is there any way how to combine these two DFs and get this result:
df_final:
| name | id | description | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
| ---- | -- | ---------------- | ------ | ------ | --------- | -------- | ------ | -------- | ------ |
| John | 1 | hours_worked | 8 | 8 | 0 | 0 | 0 | 0 | 0 |
| John | 1 | hours_charged | 8 | 8 | 0 | 0 | 0 | 0 | 0 |
| John | 1 | implementation | 5 | 8 | 0 | 0 | 0 | 0 | 0 |
| John | 1 | refactor | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| Paul | 2 | hours_worked | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| Paul | 2 | hours_charged | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
| Paul | 2 | refactor | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
Basically transpose the dates and then get on the row data from hours_worked, hours_charged and then list all hours_worked based on description. I tried to merge the two DFs but I dont know how to transpose it this way.
Upvotes: 1
Views: 86
Reputation: 862641
Use DataFrame.melt
for unpivot for second df
, join by concat
, convert datetimes by Series.dt.day_name
and use DataFrame.pivot_table
with some aggregate function, e.g. sum
, last add missing columns by list:
df1 = df_1.rename(columns={'hours_worked':'value'})
df2 = df_2.melt(['name','id','date'], var_name='description')
df = pd.concat([df1, df2])
df['date'] = pd.to_datetime(df['date']).dt.day_name()
days = ['Monday', 'Tuesday','Wednesday', 'Thursday', 'Friday','Saturday','Sunday']
df1 = (df.pivot_table(index=['name','id','description'],
columns='date',
values='value',
aggfunc='sum', fill_value=0)
.reindex(days, fill_value=0, axis=1)
.reset_index()
.rename_axis(None,axis=1))
print (df1)
name id description Monday Tuesday Wednesday Thursday Friday \
0 John 1 hours_charged 8 8 0 0 0
1 John 1 hours_worked 8 8 0 0 0
2 John 1 implementation 5 8 0 0 0
3 John 1 refactor 3 0 0 0 0
4 Paul 2 hours_charged 8 0 0 0 0
5 Paul 2 hours_worked 4 0 0 0 0
6 Paul 2 refactor 4 0 0 0 0
Saturday Sunday
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
Upvotes: 2