Reputation: 145
I have two pandas dataframes that I want to merge. The dataframes are of different sizes so I only want those that appear in df1
to be kept - some students only appear in one of df1
or df2
.
df1
has the headings ['student', 'week1_count', 'week1_mean', ..., 'week11_count', 'week11_mean']
and is initialised with all cells except the 'student'
column to be zero.
df2
has the headings ['student', 'week', 'count', 'mean']
and is populated with the corresponding 'student'
. 'week'
is an int between 1-11, and 'count'
and 'mean'
are the corresponding floats.
What I want to do is for a given student in df1
and df2
, for a given week, take the corresponding 'count'
and 'mean'
value and put it in df1
in the corresponding column. For example, a 'week'
value of 1
would mean that the value in 'count'
and 'mean'
in df2
would be put in 'week1_count'
and 'week1_mean'
respectively in df1
.
Regarding the weeks I have been looping through range(11)
and create a subset dataframe but wonder whether there is a quicker way.
i.e.
df1:
student week1_count week1_mean week2_count week2_mean ...
'0' 0 0 0 0 ...
'2' 0 0 0 0 ...
'3' 0 0 0 0 ...
.
.
.
'500' 0 0 0 0 ...
'541' 0 0 0 0 ...
'542' 0 0 0 0 ...
and
df2:
student week count mean
'0' 1 5 6.5
'1' 1 3 7.0
'2' 1 2 8.2
'2' 2 10 15.1
.
.
.
'500' 2 12 4.3
'540' 4 1 3.0
'542' 1 4 1.2
'542' 2 9 5.2
so the intended result
df_result:
student week1_count week1_mean week2_count week2_mean ...
'0' 5 6.5 0 0 ...
'2' 2 8.2 10 15.1 ...
'7' 0 0 0 0 ...
.
.
.
'500' 0 0 12 4.3 ...
'541' 0 0 0 0 ...
'542' 4 1.2 9 5.2 ...
I have tried various routines - none of which have worked as intended - in pandas such as:
df1
. I tried renaming the columns in df2
to match column names.df1
to np.nan
rather than 0.0
and then use df1.update(df2)
(after renaming cols in df2
) to update all nan values with the intended onedf1[rows_in_both][['week1_count','week1_mean']] = df2[rows_in_both][['count','mean']]
but that didn't work eitherUpvotes: 0
Views: 83
Reputation: 323226
This is more like a update
problem rather than merge
s=df2.pivot(index='student',columns='week',values=['count','mean'])# pivot df2 to format it to df1 like .
s.columns.map('week{0[1]}_{0[0]}'.format) # modify the column
Out[645]:
Index(['week1_count', 'week2_count', 'week4_count', 'week1_mean', 'week2_mean',
'week4_mean'],
dtype='object')
s.columns=s.columns.map('week{0[1]}_{0[0]}'.format)
Then we doing update
df1=df1.set_index('student')
df1=df1.update(s)
Upvotes: 1