Pandas dataframe Groupby and retrieve date range

Question

Here is my dataframe that I am working on. There are two pay periods defined: first 15 days and last 15 days for each month.

         date  employee_id hours_worked   id job_group  report_id
0  2016-11-14            2         7.50  385         B         43
1  2016-11-15            2         4.00  386         B         43
2  2016-11-30            2         4.00  387         B         43
3  2016-11-01            3        11.50  388         A         43
4  2016-11-15            3         6.00  389         A         43
5  2016-11-16            3         3.00  390         A         43
6  2016-11-30            3         6.00  391         A         43

I need to group by employee_id and job_group but at the same time I have to achieve date range for that grouped row.

For example grouped results would be like following for employee_id 1:

Expected Output:

         date  employee_id hours_worked  job_group  report_id
1  2016-11-15            2         11.50        B         43
2  2016-11-30            2         4.00         B         43
4  2016-11-15            3         17.50        A         43
5  2016-11-16            3         9.00         A         43

Is this possible using pandas dataframe groupby?

jezrael · Accepted Answer

Use SM with Grouper and last add SemiMonthEnd:

df['date'] = pd.to_datetime(df['date'])

d = {'hours_worked':'sum','report_id':'first'}
df = (df.groupby(['employee_id','job_group',pd.Grouper(freq='SM',key='date', closed='right')])
       .agg(d)
       .reset_index())

df['date'] = df['date'] + pd.offsets.SemiMonthEnd(1)
print (df)
   employee_id job_group       date  hours_worked  report_id
0            2         B 2016-11-15          11.5         43
1            2         B 2016-11-30           4.0         43
2            3         A 2016-11-15          17.5         43
3            3         A 2016-11-30           9.0         43

Pandas dataframe Groupby and retrieve date range

Expected Output:

Answers (2)

Related Questions