KeyError in Pandas when the column name is definitely there

Question

I'm trying to work out how many hours each ID is contracted to based on their work status i.e. Part-Time or Full-Time.

To do this, I need to know their weekly total hours when I'm given their daily totals. So what I've done is

FTorPT = {
'Full-time': 30,
'Part-time': 18,
}
df['Hours'] = df['Total shift hours']

df['Contract Hours'] = df['Work Status'].map(FTorPT)

I needed a sum of total hours PER ID so I thought I could aggregate the data first and get weekly hours per ID. Without this, line, it would subtract the daily hours by the Contract Hours which would be wrong.

W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})

Update to this section Even with @Quang Hoang's suggestion to change it to:

W = df.groupby(['ID', 'Contract Hours'], as_index=False)['Hours'].sum()

It still returns the daily hours rather than an accumulation of the hours over the week.

To know how many hours I need to add to meet their minimum, I've done this:

W['Top-up Hours'] = (W['Contract Hours']-W['Hours'])

The issue occurs in this line as I get Key Error: 'Contract Hours' when the column clearly exists.

Where have I gone wrong in the code?

Also Tried:

Instead of a new dataframe and replacing it with W, I've also tried obtaining weekly hours by doing this

df['Weekly Hours'] = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})

but it returns a Index Error which I've asked a question for here but I couldn't figure it out - IndexError when I added another .groupby() with pandas.

Sample Data

import pandas as pd
import numpy as np

df = {'Work Status': ['Part-time', 'Part-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Full-time', 'Part-time', 'Part-time'],
      'ID': [1, 1, 2, 2, 2, 3, 3, 4],
      'Hours': [4.67, 5.07, 3.50, 4, 3.10, 6.90, 2.92, 5.50, 9, 9],
      }

df = pd.DataFrame(data)

FTorPT = {
    'Full-time': 30,
    'Part-time': 18,
}
df['Contract Hours'] = df['Work Status'].map(FTorPT)

W = df.groupby(['ID', 'Contract Hours']).agg({'Hours':'sum'})
W['Top-up Hours'] = (W['Contract Hours'] - W['Hours'])

W['Total hours'] = np.where(W['Hours'] < W['Contract Hours'], W['Top-up Hours'] + W['Hours'], W['Hours'])

Per_Day = W.groupby(['Date', 'ID', 'Work Status']).agg({'Hours': 'sum'})
Employees = W.groupby(['ID', 'Top-up Hours']).agg({'Hours': 'sum', 'Date':'nunique'})

df.to_excel(r'filepath', header=True)

book = load_workbook(r'filepath')
writer = pd.ExcelWriter(r'filepath', engine = 'openpyxl')
writer.book = book

Per_Day.to_excel(writer, sheet_name = 'Per Day')
Employees.to_excel(writer, sheet_name = 'Employees')
writer.close()

KeyError in Pandas when the column name is definitely there

Answers (1)

Related Questions