Reputation: 19
df1:
id score
1000 174
1001 181
1002 162
1003 182
1005 97
... ...
3313 95
3316 91
3322 151
*1928 rows × 2 columns
df2:
date id
01/03/2019 1002
01/03/2019 1004
01/03/2019 1013
01/03/2019 1014
01/03/2019 1015
... ...
31/08/2019 3584
31/08/2019 3585
31/08/2019 3586
31/08/2019 3587
31/08/2019 3588
355775 rows × 3 columns
I want to get an output with the all the ids and scores from df1 and merge it with only the relevant dates from df2.
my code is pd.merge(df1, df2, how='left', on='id')
and for some reason I'm getting back also the not relevant dates.
What is wrong here?
Upvotes: 0
Views: 626
Reputation: 317
Based on the comments, here is the answer. If ID is unique in df1, but not in df2, pandas has no way of knowing the “correct” date from df2 and hence all dates will be merged to the same score for a given ID.
I suspect you would need a third dataframe where you have information that matches the (presumably) best score to the number of attempts, or something similar.
This is not a coding problem, but rather data availability. As a matter of fact your original code is fine, but only with the right inputs.
Upvotes: 1