Reputation: 179
I'm trying to work out how many hours each ID is contracted to based on their work status i.e. Part-Time or Full-Time.
To do this, I need to know their weekly total hours when I'm given their daily totals. So what I've done is
FTorPT = {
'Full-time': 30,
'Part-time': 18,
}
df['Hours'] = df['Total shift hours']
df['Contract Hours'] = df['Work Status'].map(FTorPT)
I needed a sum of total hours PER ID so I thought I could aggregate the data first and get weekly hours per ID. Without this, line, it would subtract the daily hours by the Contract Hours
which would be wrong.
W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
Update to this section Even with @Quang Hoang's suggestion to change it to:
W = df.groupby(['ID', 'Contract Hours'], as_index=False)['Hours'].sum()
It still returns the daily hours rather than an accumulation of the hours over the week.
To know how many hours I need to add to meet their minimum, I've done this:
W['Top-up Hours'] = (W['Contract Hours']-W['Hours'])
The issue occurs in this line as I get Key Error: 'Contract Hours'
when the column clearly exists.
Where have I gone wrong in the code?
Also Tried:
Instead of a new dataframe and replacing it with W
, I've also tried obtaining weekly hours by doing this
df['Weekly Hours'] = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
but it returns a Index Error
which I've asked a question for here but I couldn't figure it out - IndexError when I added another .groupby() with pandas.
Sample Data
import pandas as pd
import numpy as np
df = {'Work Status': ['Part-time', 'Part-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Part-time', 'Part-time'],
'ID': [1, 1, 2, 2, 2, 3, 3, 4],
'Hours': [4.67, 5.07, 3.50, 4, 3.10, 6.90, 2.92, 5.50, 9, 9],
}
df = pd.DataFrame(data)
FTorPT = {
'Full-time': 30,
'Part-time': 18,
}
df['Contract Hours'] = df['Work Status'].map(FTorPT)
W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
W['Top-up Hours'] = (W['Contract Hours'] - W['Hours'])
W['Total hours'] = np.where(W['Hours'] < W['Contract Hours'], W['Top-up Hours'] + W['Hours'], W['Hours'])
Per_Day = W.groupby(['Date', 'ID', 'Work Status']).agg({'Hours': 'sum'})
Employees = W.groupby(['ID', 'Top-up Hours']).agg({'Hours': 'sum', 'Date':'nunique'})
df.to_excel(r'filepath', header=True)
book = load_workbook(r'filepath')
writer = pd.ExcelWriter(r'filepath', engine = 'openpyxl')
writer.book = book
Per_Day.to_excel(writer, sheet_name = 'Per Day')
Employees.to_excel(writer, sheet_name = 'Employees')
writer.close()
Upvotes: 1
Views: 2137
Reputation: 150825
When you do W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
, Contract Hours
is the second level index, not a column, so the subsequent
W['Top-up Hours'] = (W['Contract Hours'] - W['Hours'])
fails on W['Contract Hours']
. You want
W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'}).reset_index()
Or, better, without using agg
:
W = df.groupby(['ID', 'Contract Hours'], as_index=False)['Hours'].sum()
Upvotes: 2