Reputation: 459
I'm having an issue with two functions I have defined in Python. Both functions have similar operations in the first few lines of the function body, and one will run and the other produces a 'key error' message. I will explain more below, but here are the two functions first.
#define function that looks at the number of claims that have a decider id that was dealer
#normalize by business amount
def decider(df):
#subset dataframe by date
df_sub = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]
#get the dealer id
did = df_sub['dealer_id'].unique()
#subset data further by selecting only records where 'dealer_decide' equals 1
df_dealer_decide = df_sub[df_sub['dealer_decide'] == 1]
#count the number of unique warranty claims
dealer_decide_count = df_dealer_decide['warranty_claim_number'].nunique()
#get the total sales amount for that dealer
total_sales = float(df_sub['amount'].max())
#get the number of warranty claims decided by dealer per $100k in dealer sales
decider_count_phk = dealer_decide_count * (100000/total_sales)
#create a dictionary to store results
output_dict = dict()
output_dict['decider_phk'] = decider_count_phk
output_dict['dealer_id'] = did
output_dict['total_claims_dealer_dec_Q1_2019'] = dealer_decide_count
output_dict['total_sales2019'] = total_sales
#convert resultant dictionary to dataframe
sum_df = pd.DataFrame.from_dict(output_dict)
#return the summarized dataframe
return sum_df
#apply the 'decider' function to each dealer in dataframe 'data'
decider_count = data.groupby('dealer_id').apply(decider)
#define a function that looks at the percentage change between 2018Q4 and 2019Q1 in terms of the number #of claims processed
def turnover(df):
#subset dealer records for Q1
df_subQ1 = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]
#subset dealer records for Q4
df_subQ4 = df[(df['vehicle_repair_date'] >= Q4_sd) & (df['vehicle_repair_date'] <= Q4_ed)]
#get the dealer id
did = df_subQ1['dealer_id'].unique()
#get the unique number of claims for Q1
unique_Q1 = df_subQ1['warranty_claim_number'].nunique()
#get the unique number of claims for Q1
unique_Q4 = df_subQ4['warranty_claim_number'].nunique()
#determine percent change from Q4 to Q1
percent_change = round((1 - (unique_Q1/unique_Q4))*100, ndigits = 1)
#create a dictionary to store results
output_dict = dict()
output_dict['nclaims_Q1_2019'] = unique_Q1
output_dict['nclaims_Q4_2018'] = unique_Q4
output_dict['dealer_id'] = did
output_dict['quarterly_pct_change'] = percent_change
#apply the 'turnover' function to each dealer in 'data' dataframe
dealer_turnover = data.groupby('dealer_id').apply(turnover)
Each function is being applied to the exact same dataset and I am obtaining the dealer id(variable did in function body) in the same way. I am also using the same groupby
then apply
code, but when I run the two functions the function decider
runs as expected, but the function turnover
gives the following error:
KeyError: 'dealer_id'
.
At first I thought it might be a scoping issue, but that doesn't really make sense so if anyone can shed some light on what might be happening I would greatly appreciate it.
Thanks, Curtis
Upvotes: 0
Views: 100
Reputation: 520
IIUC, you are applying turnover function after decider function. You are getting the key error since dealer_id is present as index and not as a column. Try replacing
decider_count = data.groupby('dealer_id').apply(decider)
with
decider_count = data.groupby('dealer_id', as_index=False).apply(decider)
Upvotes: 1