Reputation: 459
I'm having an issue that I can't seem to understand. I've written a function that takes a dataframe as the input and then performs a number of cleaning steps on it. When I run the function I get the error message KeyError: ('amount', 'occurred at index date')
. This doesn't make sense to me because amount
is a column in my dataframe .
Here is some code with a subset of the data created:
data = pd.DataFrame.from_dict({"date": ["10/31/2019","10/27/2019"], "amount": [-13.3, -6421.25], "vendor": ["publix","verizon"]})
#create cleaning function for dataframe
def cleaning_func(x):
#convert the amounts to positive numbers
x['amount'] = x['amount'] * -1
#convert dates to datetime for subsetting purposes
x['date'] = pd.to_datetime(x['date'])
#begin removing certain strings
x['vendor'] = x['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
x['vendor'] = x['vendor'].str.replace("[0-9]","")
x['vendor'] = x['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")
#build table of punctuation and remove from vendor strings
table = str.maketrans(dict.fromkeys(string.punctuation)) # OR {key: None for key in string.punctuation}
x['vendor'] = x['vendor'].str.translate(table)
return x
clean_data = data.apply(cleaning_func)
If someone could shed some light on why this error appears I would appreciate it.
Upvotes: 0
Views: 521
Reputation: 42916
Don't use apply
here, it's slow and basically loops over your dataframe. Just pass the function your data and let it return a cleaned up dataframe, this way it will use the vectorized methods over the whole column.
def cleaning_func(df):
#convert the amounts to positive numbers
df['amount'] = df['amount'] * -1
#convert dates to datetime for subsetting purposes
df['date'] = pd.to_datetime(df['date'])
#begin removing certain strings
df['vendor'] = df['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
df['vendor'] = df['vendor'].str.replace("[0-9]","")
df['vendor'] = df['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")
#build table of punctuation and remove from vendor strings
table = str.maketrans(dict.fromkeys(string.punctuation)) # OR {key: None for key in string.punctuation}
df['vendor'] = df['vendor'].str.translate(table)
return df
clean_df = cleaning_func(data)
Upvotes: 1