Je Je
Je Je

Reputation: 562

Calculate min max mean median for pandas DataFrame groupby Columns and join results

I have a pandas DataFrame, and i want to perform min,max,mean,median calculation on one column, Having grouped them using column A, B and C. I then want to merge the results to the initial DataFrame. I have success using the bellow when i calc the median:

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
median_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.median)
median_px.name = 'Median Px/SQM'
result_median_df = pandas_df.join(median_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left")
result_median_df.to_csv(path_or_buf='median.csv')

But when i try and calc the min and max AND add it to the DataFrame, I have the following error:

ValueError: columns overlap but no suffix specified: Index(['Px/SQM'], dtype='object')

code used for min or max:

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
min_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.min)
min_px.name = 'Min Px/SQM'
result_min_df = pandas_df.join(min_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left")
result_min_df.to_csv(path_or_buf='min_px.csv')

I have tried using suffix and it will work, but I would like to use my own column fullname. Or Am I bound to rename after use?

As well, I believe there is a way to make the request as an array: [np.min, np.mean, np.median, np.max], with a renaming column using agg but I couldn't make it work.

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
min_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.min)
min_px.name = 'Min Px/SQM'
result_min_df = pandas_df.join(min_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left", lsuffix="_min")
result_min_df.to_csv(path_or_buf='min_px.csv')

After having received great answer, just a comment.

I was trying to use the code shown here which was triggering a lot of warnings and was slower than the solution proposed:

df1=pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent']).agg({'Px/SQM':                                                                   {'Min': np.min,'Max': np.max,'Mean': np.mean,'Median': np.median                                                                  }} ).reset_index()df3= pd.merge(pandas_df, df1, on=['ZIP', 'Updated', 'Buy/Rent'], how='left')

Upvotes: 4

Views: 2415

Answers (1)

BENY
BENY

Reputation: 323306

When you need adding columns to the original dfs , you can always using transform

g=pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])['Px/SQM']

pandas_df['Max']=g.transform('max')
pandas_df['Min']=g.transform('min')
pandas_df['Median']=g.transform(np.median)
pandas_df['Mean']=g.transform('mean')

Upvotes: 2

Related Questions