Fourier
Fourier

Reputation: 2993

Merge two data frames on multiple values

I have two data frames which look like this

df1

              name  ID      abb
     0         foo  251803  I
     1         bar  376811  R
     2         baz  174254  Q
     3      foofoo  337144  IRQ
     4      barbar  306521  IQ

df2

          abb    comment
     0      I       fine
     1      R     repeat
     2      Q      other

I am trying to use pandas merge to join the two data frames and simply assign the comment column in the second data frame to the first based on the abb column in the following way:

df1.merge(df2, how='inner', on='abb')

resulting in:

              name  ID      abb  comment
     0         foo  251803  I       fine
     1         bar  376811  R     repeat
     2         baz  174254  Q      other

This works well for the unique one letter identifiers in abb. However, it obviously fails for more than one character.

I tried to use list on the abb column in first data frame but this results in a KeyError.

What I would like to do is the following.

1) Seperate the rows containing more than one character in this column into several rows

2) Merge the data frames

3) Optionally: Combine the rows again

Upvotes: 1

Views: 1401

Answers (2)

piRSquared
piRSquared

Reputation: 294498

See this answer for various ways to explode on a column

rows = []
for i, row in df1.iterrows():
    for a in row.abb:
        rows.append([row['ID'], a, row['name']])

df11 = pd.DataFrame(rows, columns=df1.columns)

df11.merge(df2)

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 863291

Use join:

print (df1)
     name      ID  abb
0     foo  251803    I
1     bar  376811    R
2     baz  174254    Q
3  foofoo  337144  IRQ
4  barbar  306521   IQ

#each character to df, which is stacked to Series
s = df1.abb.apply(lambda x: pd.Series(list(x)))
           .stack()
           .reset_index(drop=True, level=1)
           .rename('abb')
print (s)
0    I
1    R
2    Q
3    I
3    R
3    Q
4    I
4    Q
Name: abb, dtype: object

df1 = df1.drop('abb', axis=1).join(s)
print (df1)
     name      ID abb
0     foo  251803   I
1     bar  376811   R
2     baz  174254   Q
3  foofoo  337144   I
3  foofoo  337144   R
3  foofoo  337144   Q
4  barbar  306521   I
4  barbar  306521   Q

Upvotes: 2

Related Questions