Reputation: 571
When merging 2 datasets, I get NAN on the second raw for each category.
This is a toy dataset to illustrate the problem:
df1=pd.DataFrame({'Num':[1,1,2,3,3],
'date':['1995-09-01','1995-10-04','1995-11-07','1995-11-10','1995-11-25'],
'A':[42.5,40,38,40,28],
'B': [13.3,12.3,12.2,11,10]})
df2=pd.DataFrame({'Num':[1,1,1,1,2,2,3,3,3,3],
'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-25'],
'C':[42.5,39.5,37.2,40,41,38,38.2,39.7,40,28],
'D': [13.3,12.8,12.1,12.3,13.3,12.2,12.4,12.8,11,10]})
After running the following code:
data = pd.merge(df1, df2, how='left', left_on=['Num','date'], right_on = ['Num','date'])
Here is what I should obtain (which I do with this toy dataset)
However, with my real dataset, I obtain :
I have checked datatypes and they match, and no nulls or nans show up on keys. Num is formated as int64 and date as datetime64
Upvotes: 1
Views: 217
Reputation: 571
If you ever face a situation like above described problem, here's what I did to solve the situation :
If the problem persists, check the rows of the column/s (keys) you are willing to do the merge on.
In my case, "num" key was ok. However, "date" key presented different rows in df2 compared to df1. This explains that after the merge, some rows would contain data (on the right part) and some other wouldn't.
Given the merge type I had chosen (how="left):
Upvotes: 1