python pandas merge missing columns from right dataframe

Question

I encountered a problem just like this Python pandas merge keyerror but none of the solutions worked for me. I assume some of the code isn't relevant but I just included it just in case. I basically want to calculate some gradients on some of the rows from one column and join it back to the original dataframe

import numpy as np
from scipy import signal

df = item_loc_df_dict['key_name']
prob_series = df['col1'].astype(float).values[5:(len(df['col1']) - 5)]
week_series = df['date_col'].values[5:(len(df['date_col']) - 5)]

gradient = np.square(np.gradient(prob_series))
win = signal.windows.hann(8)
gradients = signal.convolve(gradient, win, mode='same') / sum(win)
grad_df = pd.DataFrame({'date_col': week_series, 'peak_grads': gradients})

gradient_mean = np.mean(gradients)
gradient_std = np.std(gradients)
min_height = gradient_mean + gradient_std
peak_indices = signal.find_peaks(gradients, height=min_height)[0]
peak_grad_df = grad_df.iloc[peak_indices].reset_index(drop=True)

peak_grad_df['date_col'] = peak_grad_df['date_col'].astype('datetime64[ns]')
df['date_col'] = df['date_col'].astype('datetime64[ns]')

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left')
joined_df['peak_grads'].fillna(0, inplace=True)

I would get KeyError: peak_grads on the last line, so the merge wasn't including the column from my right dataframe. The left dataframe df has about 200 columns, so I tried adding

df = df[['date_col', 'col2', 'col3']]

to the top of my code, and then the KeyError went away and my merge worked as expected. Why does this merge only work when I reduce the number of columns from my left dataframe?

python pandas merge missing columns from right dataframe

Answers (1)

Related Questions