josepmaria
josepmaria

Reputation: 571

Pandas merge 2 keys, second key return first raw correct second Nan

When merging 2 datasets, I get NAN on the second raw for each category.

This is a toy dataset to illustrate the problem:

df1=pd.DataFrame({'Num':[1,1,2,3,3],
                  'date':['1995-09-01','1995-10-04','1995-11-07','1995-11-10','1995-11-25'],
                  'A':[42.5,40,38,40,28],
                  'B': [13.3,12.3,12.2,11,10]})

enter image description here

df2=pd.DataFrame({'Num':[1,1,1,1,2,2,3,3,3,3],
                  'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-25'],
                  'C':[42.5,39.5,37.2,40,41,38,38.2,39.7,40,28],
                  'D': [13.3,12.8,12.1,12.3,13.3,12.2,12.4,12.8,11,10]})

enter image description here

After running the following code:

data = pd.merge(df1, df2,  how='left', left_on=['Num','date'], right_on = ['Num','date'])

Here is what I should obtain (which I do with this toy dataset) enter image description here

However, with my real dataset, I obtain :

enter image description here

I have checked datatypes and they match, and no nulls or nans show up on keys. Num is formated as int64 and date as datetime64

Upvotes: 1

Views: 217

Answers (2)

josepmaria
josepmaria

Reputation: 571

If you ever face a situation like above described problem, here's what I did to solve the situation :

  • check dtypes for the common keys match on both dataframes

If the problem persists, check the rows of the column/s (keys) you are willing to do the merge on.

In my case, "num" key was ok. However, "date" key presented different rows in df2 compared to df1. This explains that after the merge, some rows would contain data (on the right part) and some other wouldn't.

Given the merge type I had chosen (how="left):

  • resulting shape of the merge dataframe was correct.
  • all rows contained correct info on for the left dataframe (df1)
  • some rows on the right side of the merged dataframe would contain NAN given the lack of match (of one of the 2 keys) within the first and the second data frame.

Upvotes: 1

Giovanni Frison
Giovanni Frison

Reputation: 688

I would go with:

df1.merge(df2, on=['Num','date'])

Upvotes: 0

Related Questions