Paddy Roddy
Paddy Roddy

Reputation: 145

How to merge two pandas dataframes based on a value in one row and with different column names?

I have two pandas dataframes that I want to merge. The dataframes are of different sizes so I only want those that appear in df1 to be kept - some students only appear in one of df1 or df2.

df1 has the headings ['student', 'week1_count', 'week1_mean', ..., 'week11_count', 'week11_mean'] and is initialised with all cells except the 'student' column to be zero.

df2 has the headings ['student', 'week', 'count', 'mean'] and is populated with the corresponding 'student'. 'week' is an int between 1-11, and 'count' and 'mean' are the corresponding floats.

What I want to do is for a given student in df1 and df2, for a given week, take the corresponding 'count' and 'mean' value and put it in df1 in the corresponding column. For example, a 'week' value of 1 would mean that the value in 'count' and 'mean' in df2 would be put in 'week1_count' and 'week1_mean' respectively in df1.

Regarding the weeks I have been looping through range(11) and create a subset dataframe but wonder whether there is a quicker way.

i.e.

df1:
    student week1_count week1_mean week2_count week2_mean ... 
      '0'        0           0          0            0    ...
      '2'        0           0          0            0    ...
      '3'        0           0          0            0    ...
      .
      .
      .
      '500'      0           0          0            0    ...
      '541'      0           0          0            0    ...
      '542'      0           0          0            0    ... 

and

df2:
    student week count mean
      '0'     1    5    6.5
      '1'     1    3    7.0
      '2'     1    2    8.2
      '2'     2    10   15.1
      .
      .
      .
     '500'    2    12   4.3
     '540'    4    1    3.0
     '542'    1    4    1.2
     '542'    2    9    5.2

so the intended result

df_result:
    student week1_count week1_mean week2_count week2_mean ... 
      '0'        5           6.5        0            0    ...
      '2'        2           8.2        10           15.1 ...
      '7'        0           0          0            0    ...
      .
      .
      .
      '500'      0           0          12           4.3  ...
      '541'      0           0          0            0    ...
      '542'      4           1.2        9            5.2  ... 

I have tried various routines - none of which have worked as intended - in pandas such as:

Upvotes: 0

Views: 83

Answers (1)

BENY
BENY

Reputation: 323226

This is more like a update problem rather than merge

s=df2.pivot(index='student',columns='week',values=['count','mean'])# pivot df2 to format it to df1 like . 
s.columns.map('week{0[1]}_{0[0]}'.format) # modify the column
Out[645]: 
Index(['week1_count', 'week2_count', 'week4_count', 'week1_mean', 'week2_mean',
       'week4_mean'],
      dtype='object')
s.columns=s.columns.map('week{0[1]}_{0[0]}'.format) 

Then we doing update

df1=df1.set_index('student')
df1=df1.update(s)

Upvotes: 1

Related Questions