gary69
gary69

Reputation: 4250

python pandas merge missing columns from right dataframe

I encountered a problem just like this Python pandas merge keyerror but none of the solutions worked for me. I assume some of the code isn't relevant but I just included it just in case. I basically want to calculate some gradients on some of the rows from one column and join it back to the original dataframe

import numpy as np
from scipy import signal

df = item_loc_df_dict['key_name']
prob_series = df['col1'].astype(float).values[5:(len(df['col1']) - 5)]
week_series = df['date_col'].values[5:(len(df['date_col']) - 5)]

gradient = np.square(np.gradient(prob_series))
win = signal.windows.hann(8)
gradients = signal.convolve(gradient, win, mode='same') / sum(win)
grad_df = pd.DataFrame({'date_col': week_series, 'peak_grads': gradients})

gradient_mean = np.mean(gradients)
gradient_std = np.std(gradients)
min_height = gradient_mean + gradient_std
peak_indices = signal.find_peaks(gradients, height=min_height)[0]
peak_grad_df = grad_df.iloc[peak_indices].reset_index(drop=True)

peak_grad_df['date_col'] = peak_grad_df['date_col'].astype('datetime64[ns]')
df['date_col'] = df['date_col'].astype('datetime64[ns]')

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left')
joined_df['peak_grads'].fillna(0, inplace=True)

I would get KeyError: peak_grads on the last line, so the merge wasn't including the column from my right dataframe. The left dataframe df has about 200 columns, so I tried adding

df = df[['date_col', 'col2', 'col3']]

to the top of my code, and then the KeyError went away and my merge worked as expected. Why does this merge only work when I reduce the number of columns from my left dataframe?

Upvotes: 0

Views: 1401

Answers (1)

Fernanda Milagres
Fernanda Milagres

Reputation: 86

Use suffixes parameter or print columns (joined_df.columns) to test @christina Stebbins hypothesis

joined_df = df.merge(peak_grad_df, left_on=['date_col'], right_on=['date_col'], how='left', suffixes=(False, False)))

This must raise "ValueError: columns overlap but no suffix specified: Index(['peak_grads']"

Upvotes: 1

Related Questions