Reputation: 4250
I encountered a problem just like this Python pandas merge keyerror but none of the solutions worked for me. I assume some of the code isn't relevant but I just included it just in case. I basically want to calculate some gradients on some of the rows from one column and join it back to the original dataframe
import numpy as np
from scipy import signal
df = item_loc_df_dict['key_name']
prob_series = df['col1'].astype(float).values[5:(len(df['col1']) - 5)]
week_series = df['date_col'].values[5:(len(df['date_col']) - 5)]
gradient = np.square(np.gradient(prob_series))
win = signal.windows.hann(8)
gradients = signal.convolve(gradient, win, mode='same') / sum(win)
grad_df = pd.DataFrame({'date_col': week_series, 'peak_grads': gradients})
gradient_mean = np.mean(gradients)
gradient_std = np.std(gradients)
min_height = gradient_mean + gradient_std
peak_indices = signal.find_peaks(gradients, height=min_height)[0]
peak_grad_df = grad_df.iloc[peak_indices].reset_index(drop=True)
peak_grad_df['date_col'] = peak_grad_df['date_col'].astype('datetime64[ns]')
df['date_col'] = df['date_col'].astype('datetime64[ns]')
joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left')
joined_df['peak_grads'].fillna(0, inplace=True)
I would get KeyError: peak_grads
on the last line, so the merge wasn't including the column from my right dataframe. The left dataframe df
has about 200 columns, so I tried adding
df = df[['date_col', 'col2', 'col3']]
to the top of my code, and then the KeyError
went away and my merge worked as expected. Why does this merge only work when I reduce the number of columns from my left dataframe?
Upvotes: 0
Views: 1401
Reputation: 86
Use suffixes parameter or print columns (joined_df.columns) to test @christina Stebbins hypothesis
joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left', suffixes=(False, False)))
This must raise "ValueError: columns overlap but no suffix specified: Index(['peak_grads']"
Upvotes: 1