Arnold Klein
Arnold Klein

Reputation: 3086

Merge two dataframes within a group with GroupBy

I have two dataframes and need to merge them based on a date, but the merge should be done for each group (participant_id) separately.

df1: 
   response_date  summary  epis_mark  participant_id
0     2012-01-04      0.0      False              13
1     2012-01-11      0.0      False              13
2     2012-01-19      0.0      False              13
3     2012-01-29      0.0      False              13
4     2012-02-02      0.0      False              13
0     2012-01-02      8.0       True              14
1     2012-01-10      5.0      False              14
2     2012-01-18      2.0      False              14
3     2012-01-24      1.0      False              14
4     2012-01-31      2.0      False              14
0     2012-01-07      4.0      False              17
1     2012-01-11      NaN      False              17
2     2012-01-18      4.0      False              17
3     2012-01-25      NaN      False              17
4     2012-02-01      NaN      False              17



  df2:
   response_date  summary  epis_mark  participant_id
0     2012-01-04     17.0       True              13
1     2012-01-11     18.0       True              13
2     2012-01-19     16.0       True              13
3     2012-01-29     15.0       True              13
4     2012-02-02     15.0       True              13
0     2012-01-02     12.0       True              14
1     2012-01-10      8.0       True              14
2     2012-01-18     21.0       True              14
3     2012-01-24     19.0       True              14
4     2012-01-31     20.0       True              14
0     2012-01-04      NaN      False              17
1     2012-01-11      NaN      False              17
2     2012-01-18      NaN      False              17
3     2012-01-25      NaN      False              17
4     2012-02-01      NaN      False              17

I need to get one dataframe (wide), where merging is done on response date for each participant_id independently. Something like:

>> pd.merge(df1[df1.participant_id == i], df2[df2.participant_id == i], on='response_date', how='outer')

But without looping over i and using groupby.

Upvotes: 2

Views: 2516

Answers (2)

Jairia
Jairia

Reputation: 11

I'm not really sure if I get you correctly.

You could try the following:

pd.merge(df1, df2, on=['response date', 'participant_id'], how='outer')

Upvotes: 1

unutbu
unutbu

Reputation: 879291

Merge on both response_date and participant_id:

In [75]: pd.merge(df1, df2, on=['response_date', 'participant_id'], how='outer')
Out[75]: 
   response_date  summary_x epis_mark_x  participant_id  summary_y epis_mark_y
0     2012-01-04        0.0       False              13       17.0        True
1     2012-01-11        0.0       False              13       18.0        True
2     2012-01-19        0.0       False              13       16.0        True
3     2012-01-29        0.0       False              13       15.0        True
4     2012-02-02        0.0       False              13       15.0        True
5     2012-01-02        8.0        True              14       12.0        True
6     2012-01-10        5.0       False              14        8.0        True
7     2012-01-18        2.0       False              14       21.0        True
8     2012-01-24        1.0       False              14       19.0        True
9     2012-01-31        2.0       False              14       20.0        True
10    2012-01-07        4.0       False              17        NaN         NaN
11    2012-01-11        NaN       False              17        NaN       False
12    2012-01-18        4.0       False              17        NaN       False
13    2012-01-25        NaN       False              17        NaN       False
14    2012-02-01        NaN       False              17        NaN       False
15    2012-01-04        NaN         NaN              17        NaN       False

Upvotes: 2

Related Questions