Curtis
Curtis

Reputation: 459

KeyError for column that is in Pandas dataframe

I'm having an issue that I can't seem to understand. I've written a function that takes a dataframe as the input and then performs a number of cleaning steps on it. When I run the function I get the error message KeyError: ('amount', 'occurred at index date'). This doesn't make sense to me because amount is a column in my dataframe .

Here is some code with a subset of the data created:

data = pd.DataFrame.from_dict({"date": ["10/31/2019","10/27/2019"], "amount": [-13.3, -6421.25], "vendor": ["publix","verizon"]})

#create cleaning function for dataframe
def cleaning_func(x):

    #convert the amounts to positive numbers
    x['amount'] =  x['amount'] * -1

    #convert dates to datetime for subsetting purposes
    x['date'] = pd.to_datetime(x['date'])

    #begin removing certain strings
    x['vendor'] = x['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
    x['vendor'] = x['vendor'].str.replace("[0-9]","")
    x['vendor'] = x['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")

    #build table of punctuation and remove from vendor strings
    table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
    x['vendor'] = x['vendor'].str.translate(table)

    return x
clean_data = data.apply(cleaning_func)

If someone could shed some light on why this error appears I would appreciate it.

Upvotes: 0

Views: 521

Answers (1)

Erfan
Erfan

Reputation: 42916

Don't use apply here, it's slow and basically loops over your dataframe. Just pass the function your data and let it return a cleaned up dataframe, this way it will use the vectorized methods over the whole column.

def cleaning_func(df):

    #convert the amounts to positive numbers
    df['amount'] =  df['amount'] * -1

    #convert dates to datetime for subsetting purposes
    df['date'] = pd.to_datetime(df['date'])

    #begin removing certain strings
    df['vendor'] = df['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
    df['vendor'] = df['vendor'].str.replace("[0-9]","")
    df['vendor'] = df['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")

    #build table of punctuation and remove from vendor strings
    table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
    df['vendor'] = df['vendor'].str.translate(table)

    return df

clean_df = cleaning_func(data)

Upvotes: 1

Related Questions