xRay
xRay

Reputation: 19

python - doing a left merge and getting wrong output

df1:

id      score
1000    174
1001    181
1002    162
1003    182
1005    97
...     ...
3313    95
3316    91
3322    151

*1928 rows × 2 columns

df2:

date        id
01/03/2019  1002    
01/03/2019  1004    
01/03/2019  1013    
01/03/2019  1014
01/03/2019  1015
...         ... 
31/08/2019  3584
31/08/2019  3585
31/08/2019  3586
31/08/2019  3587
31/08/2019  3588
355775 rows × 3 columns

I want to get an output with the all the ids and scores from df1 and merge it with only the relevant dates from df2.

my code is pd.merge(df1, df2, how='left', on='id') and for some reason I'm getting back also the not relevant dates.

What is wrong here?

Upvotes: 0

Views: 626

Answers (1)

malvoisen
malvoisen

Reputation: 317

Based on the comments, here is the answer. If ID is unique in df1, but not in df2, pandas has no way of knowing the “correct” date from df2 and hence all dates will be merged to the same score for a given ID.

I suspect you would need a third dataframe where you have information that matches the (presumably) best score to the number of attempts, or something similar.

This is not a coding problem, but rather data availability. As a matter of fact your original code is fine, but only with the right inputs.

Upvotes: 1

Related Questions