How to merge two pandas dataframes based on a value in one row and with different column names?

Question

I have two pandas dataframes that I want to merge. The dataframes are of different sizes so I only want those that appear in df1 to be kept - some students only appear in one of df1 or df2.

df1 has the headings ['student', 'week1_count', 'week1_mean', ..., 'week11_count', 'week11_mean'] and is initialised with all cells except the 'student' column to be zero.

df2 has the headings ['student', 'week', 'count', 'mean'] and is populated with the corresponding 'student'. 'week' is an int between 1-11, and 'count' and 'mean' are the corresponding floats.

What I want to do is for a given student in df1 and df2, for a given week, take the corresponding 'count' and 'mean' value and put it in df1 in the corresponding column. For example, a 'week' value of 1 would mean that the value in 'count' and 'mean' in df2 would be put in 'week1_count' and 'week1_mean' respectively in df1.

Regarding the weeks I have been looping through range(11) and create a subset dataframe but wonder whether there is a quicker way.

i.e.

df1:
    student week1_count week1_mean week2_count week2_mean ... 
      '0'        0           0          0            0    ...
      '2'        0           0          0            0    ...
      '3'        0           0          0            0    ...
      .
      .
      .
      '500'      0           0          0            0    ...
      '541'      0           0          0            0    ...
      '542'      0           0          0            0    ...

and

df2:
    student week count mean
      '0'     1    5    6.5
      '1'     1    3    7.0
      '2'     1    2    8.2
      '2'     2    10   15.1
      .
      .
      .
     '500'    2    12   4.3
     '540'    4    1    3.0
     '542'    1    4    1.2
     '542'    2    9    5.2

so the intended result

df_result:
    student week1_count week1_mean week2_count week2_mean ... 
      '0'        5           6.5        0            0    ...
      '2'        2           8.2        10           15.1 ...
      '7'        0           0          0            0    ...
      .
      .
      .
      '500'      0           0          12           4.3  ...
      '541'      0           0          0            0    ...
      '542'      4           1.2        9            5.2  ...

I have tried various routines - none of which have worked as intended - in pandas such as:

merge: using a 'left' join as I want the format of df1. I tried renaming the columns in df2 to match column names.
join
concat
update: tried initialising all cells in df1 to np.nan rather than 0.0 and then use df1.update(df2) (after renaming cols in df2) to update all nan values with the intended one
tried to just set the values: i.e. something like df1[rows_in_both][['week1_count','week1_mean']] = df2[rows_in_both][['count','mean']] but that didn't work either

BENY · Accepted Answer

This is more like a update problem rather than merge

s=df2.pivot(index='student',columns='week',values=['count','mean'])# pivot df2 to format it to df1 like . 
s.columns.map('week{0[1]}_{0[0]}'.format) # modify the column
Out[645]: 
Index(['week1_count', 'week2_count', 'week4_count', 'week1_mean', 'week2_mean',
       'week4_mean'],
      dtype='object')
s.columns=s.columns.map('week{0[1]}_{0[0]}'.format)

Then we doing update

df1=df1.set_index('student')
df1=df1.update(s)

How to merge two pandas dataframes based on a value in one row and with different column names?

Answers (1)

Related Questions