Amol Desai
Amol Desai

Reputation: 117

Combining DataFrames without Nans

I have two df. One maps values to IDs. The other one has multiple entries of these IDs. I want to have a df with the first dataframe with the values assigned to the respective IDs.

 df1 = 
    Val1   Val2  Val3
x   1000    2     0
y   2000    3     9
z   3000    1     8

df2=
      foo          ID    bar
0    something     y      a
1    nothing       y      b
2    everything    x      c
3    who           z      d

result=
      foo          ID    bar    Val1     Val2     Val3
0    something     y      a     2000      3        9
1    nothing       y      b     2000      3        9
2    everything    x      c     1000      2        0
3    who           z      d     3000      1        8

I've tried merge and join (obviously incorrectly) but I am getting a bunch of NaNs when I do that. It appears that I am getting NaNs on every alternate ID. I have also tried indexing both DFs by ID but that didn't seem to help either. I am obviously missing something that I am guessing is a core functionality but I can't get my head around it.

Upvotes: 1

Views: 91

Answers (1)

bdiamante
bdiamante

Reputation: 17570

merge and join could both get you the result DataFrame you want. Since one of your DataFrames is indexed (by ID) and the other has just a integer index, merge is the logical choice.

Merge:

# use ID as the column to join on in df2 and the index of df1 
result = df2.merge(df1, left_on="ID", right_index=True, how="inner")

Join:

df2.set_index("ID", inplace=True)  # index df2 in place so you can use join, which merges by index by default
result = df2.join(df1, how="inner") # join df1 by index

Upvotes: 3

Related Questions