pipikej
pipikej

Reputation: 305

pandas - join two dataframe and then transpose by a date

I have two pandas dataframes. First have data how many hours person spent doing what that day. Second dataframe is Sum of the day and the hours the person actually were in work. Examples here:

df_1:
| name | id | description    | hours_worked | date       |
| ---- | -- | -------------- | ------------ | ---------- |
| John | 1  | implementation | 5            | 2020-11-16 |
| John | 1  | refactor       | 3            | 2020-11-16 |    
| John | 1  | implementation | 8            | 2020-11-17 |
| Paul | 2  | refactor       | 4            | 2020-11-16 |

df_2:
| name | id | hours_charged  | hours_worked | date       |
| ---- | -- | -------------- | ------------ | ---------- |
| John | 1  | 8              | 8            | 2020-11-16 |   
| John | 1  | 8              | 8            | 2020-11-17 |
| Paul | 2  | 8              | 4            | 2020-11-16 |

Is there any way how to combine these two DFs and get this result:

df_final:
| name | id | description      | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
| ---- | -- | ---------------- | ------ | ------  | --------- | -------- | ------ | -------- | ------ |
| John | 1  | hours_worked     |   8    | 8       | 0         | 0        | 0      | 0        | 0      | 
| John | 1  | hours_charged    |   8    | 8       | 0         | 0        | 0      | 0        | 0      | 
| John | 1  | implementation   |   5    | 8       | 0         | 0        | 0      | 0        | 0      | 
| John | 1  | refactor         |   3    | 0       | 0         | 0        | 0      | 0        | 0      | 
| Paul | 2  | hours_worked     |   4    | 0       | 0         | 0        | 0      | 0        | 0      | 
| Paul | 2  | hours_charged    |   8    | 0       | 0         | 0        | 0      | 0        | 0      |     
| Paul | 2  | refactor         |   4    | 0       | 0         | 0        | 0      | 0        | 0      | 

Basically transpose the dates and then get on the row data from hours_worked, hours_charged and then list all hours_worked based on description. I tried to merge the two DFs but I dont know how to transpose it this way.

Upvotes: 1

Views: 86

Answers (1)

jezrael
jezrael

Reputation: 862641

Use DataFrame.melt for unpivot for second df, join by concat, convert datetimes by Series.dt.day_name and use DataFrame.pivot_table with some aggregate function, e.g. sum, last add missing columns by list:

df1 = df_1.rename(columns={'hours_worked':'value'})
df2 = df_2.melt(['name','id','date'], var_name='description')

df = pd.concat([df1, df2])
df['date'] = pd.to_datetime(df['date']).dt.day_name()

days = ['Monday', 'Tuesday','Wednesday', 'Thursday', 'Friday','Saturday','Sunday']
df1 = (df.pivot_table(index=['name','id','description'], 
                      columns='date', 
                      values='value', 
                      aggfunc='sum', fill_value=0)
        .reindex(days, fill_value=0, axis=1)
        .reset_index()
        .rename_axis(None,axis=1))
print (df1)
   name  id     description  Monday  Tuesday  Wednesday  Thursday  Friday  \
0  John   1   hours_charged       8        8          0         0       0   
1  John   1    hours_worked       8        8          0         0       0   
2  John   1  implementation       5        8          0         0       0   
3  John   1        refactor       3        0          0         0       0   
4  Paul   2   hours_charged       8        0          0         0       0   
5  Paul   2    hours_worked       4        0          0         0       0   
6  Paul   2        refactor       4        0          0         0       0   

   Saturday  Sunday  
0         0       0  
1         0       0  
2         0       0  
3         0       0  
4         0       0  
5         0       0  
6         0       0  

Upvotes: 2

Related Questions