cchev
cchev

Reputation: 179

KeyError in Pandas when the column name is definitely there

I'm trying to work out how many hours each ID is contracted to based on their work status i.e. Part-Time or Full-Time.

To do this, I need to know their weekly total hours when I'm given their daily totals. So what I've done is

FTorPT = {
'Full-time': 30,
'Part-time': 18,
}
df['Hours'] = df['Total shift hours']

df['Contract Hours'] = df['Work Status'].map(FTorPT)

I needed a sum of total hours PER ID so I thought I could aggregate the data first and get weekly hours per ID. Without this, line, it would subtract the daily hours by the Contract Hours which would be wrong.

W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})

Update to this section Even with @Quang Hoang's suggestion to change it to:

W = df.groupby(['ID', 'Contract Hours'], as_index=False)['Hours'].sum()

It still returns the daily hours rather than an accumulation of the hours over the week.

To know how many hours I need to add to meet their minimum, I've done this:

W['Top-up Hours'] = (W['Contract Hours']-W['Hours'])

The issue occurs in this line as I get Key Error: 'Contract Hours' when the column clearly exists.

Where have I gone wrong in the code?

Also Tried:

Instead of a new dataframe and replacing it with W, I've also tried obtaining weekly hours by doing this

df['Weekly Hours'] = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'}) 

but it returns a Index Error which I've asked a question for here but I couldn't figure it out - IndexError when I added another .groupby() with pandas.

Sample Data

import pandas as pd
import numpy as np

df = {'Work Status': ['Part-time', 'Part-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Part-time', 'Part-time'],
      'ID': [1, 1, 2, 2, 2, 3, 3, 4],
      'Hours': [4.67, 5.07, 3.50, 4, 3.10, 6.90, 2.92, 5.50, 9, 9],
      }

df = pd.DataFrame(data)

FTorPT = {
    'Full-time': 30,
    'Part-time': 18,
}
df['Contract Hours'] = df['Work Status'].map(FTorPT)

W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
W['Top-up Hours'] = (W['Contract Hours'] - W['Hours'])

W['Total hours'] = np.where(W['Hours'] < W['Contract Hours'], W['Top-up Hours'] + W['Hours'], W['Hours'])

Per_Day = W.groupby(['Date', 'ID', 'Work Status']).agg({'Hours': 'sum'})
Employees = W.groupby(['ID', 'Top-up Hours']).agg({'Hours': 'sum', 'Date':'nunique'})

df.to_excel(r'filepath', header=True)

book = load_workbook(r'filepath')
writer = pd.ExcelWriter(r'filepath', engine = 'openpyxl')
writer.book = book

Per_Day.to_excel(writer, sheet_name = 'Per Day')
Employees.to_excel(writer, sheet_name = 'Employees')
writer.close()

Upvotes: 1

Views: 2137

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

When you do W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'}), Contract Hours is the second level index, not a column, so the subsequent W['Top-up Hours'] = (W['Contract Hours'] - W['Hours']) fails on W['Contract Hours']. You want

W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'}).reset_index()

Or, better, without using agg:

W = df.groupby(['ID', 'Contract Hours'], as_index=False)['Hours'].sum()

Upvotes: 2

Related Questions