Curtis
Curtis

Reputation: 459

Similar functions in Python don't produce same result

I'm having an issue with two functions I have defined in Python. Both functions have similar operations in the first few lines of the function body, and one will run and the other produces a 'key error' message. I will explain more below, but here are the two functions first.

#define function that looks at the number of claims that have a decider id that was dealer
#normalize by business amount   
def decider(df):
    #subset dataframe by date
    df_sub = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]

    #get the dealer id
    did = df_sub['dealer_id'].unique()

    #subset data further by selecting only records where 'dealer_decide' equals 1
    df_dealer_decide = df_sub[df_sub['dealer_decide'] == 1]

    #count the number of unique warranty claims
    dealer_decide_count = df_dealer_decide['warranty_claim_number'].nunique()

    #get the total sales amount for that dealer
    total_sales = float(df_sub['amount'].max())

    #get the number of warranty claims decided by dealer per $100k in dealer sales
    decider_count_phk = dealer_decide_count * (100000/total_sales)


    #create a dictionary to store results
    output_dict = dict()
    output_dict['decider_phk'] = decider_count_phk
    output_dict['dealer_id'] = did
    output_dict['total_claims_dealer_dec_Q1_2019'] = dealer_decide_count
    output_dict['total_sales2019'] = total_sales

    #convert resultant dictionary to dataframe
    sum_df = pd.DataFrame.from_dict(output_dict)

    #return the summarized dataframe
    return sum_df

#apply the 'decider' function to each dealer in dataframe 'data'
decider_count = data.groupby('dealer_id').apply(decider)  


#define a function that looks at the percentage change between 2018Q4 and 2019Q1 in terms of the number #of claims processed 
def turnover(df):

    #subset dealer records for Q1
    df_subQ1 = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]

    #subset dealer records for Q4
    df_subQ4 = df[(df['vehicle_repair_date'] >= Q4_sd) & (df['vehicle_repair_date'] <= Q4_ed)]

    #get the dealer id
    did = df_subQ1['dealer_id'].unique()

    #get the unique number of claims for Q1
    unique_Q1 = df_subQ1['warranty_claim_number'].nunique()

    #get the unique number of claims for Q1
    unique_Q4 = df_subQ4['warranty_claim_number'].nunique()

    #determine percent change from Q4 to Q1
    percent_change = round((1 - (unique_Q1/unique_Q4))*100, ndigits = 1)

    #create a dictionary to store results
    output_dict = dict()
    output_dict['nclaims_Q1_2019'] = unique_Q1
    output_dict['nclaims_Q4_2018'] = unique_Q4
    output_dict['dealer_id'] = did
    output_dict['quarterly_pct_change'] = percent_change

#apply the 'turnover' function to each dealer in 'data' dataframe    
dealer_turnover = data.groupby('dealer_id').apply(turnover)  

Each function is being applied to the exact same dataset and I am obtaining the dealer id(variable did in function body) in the same way. I am also using the same groupby then apply code, but when I run the two functions the function decider runs as expected, but the function turnover gives the following error:

KeyError: 'dealer_id'.

At first I thought it might be a scoping issue, but that doesn't really make sense so if anyone can shed some light on what might be happening I would greatly appreciate it.

Thanks, Curtis

Upvotes: 0

Views: 100

Answers (1)

Suraj Motaparthy
Suraj Motaparthy

Reputation: 520

IIUC, you are applying turnover function after decider function. You are getting the key error since dealer_id is present as index and not as a column. Try replacing

decider_count = data.groupby('dealer_id').apply(decider)

with

decider_count = data.groupby('dealer_id', as_index=False).apply(decider)

Upvotes: 1

Related Questions