Reputation: 2993
I have two data frames which look like this
df1
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 IRQ
4 barbar 306521 IQ
df2
abb comment
0 I fine
1 R repeat
2 Q other
I am trying to use pandas merge
to join the two data frames and simply assign the comment
column in the second data frame to the first based on the abb
column in the following way:
df1.merge(df2, how='inner', on='abb')
resulting in:
name ID abb comment
0 foo 251803 I fine
1 bar 376811 R repeat
2 baz 174254 Q other
This works well for the unique one letter identifiers in abb
. However, it obviously fails for more than one character.
I tried to use list
on the abb
column in first data frame but this results in a KeyError
.
What I would like to do is the following.
1) Seperate the rows containing more than one character in this column into several rows
2) Merge the data frames
3) Optionally: Combine the rows again
Upvotes: 1
Views: 1401
Reputation: 294498
See this answer for various ways to explode on a column
rows = []
for i, row in df1.iterrows():
for a in row.abb:
rows.append([row['ID'], a, row['name']])
df11 = pd.DataFrame(rows, columns=df1.columns)
df11.merge(df2)
Upvotes: 1
Reputation: 863291
Use join
:
print (df1)
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 IRQ
4 barbar 306521 IQ
#each character to df, which is stacked to Series
s = df1.abb.apply(lambda x: pd.Series(list(x)))
.stack()
.reset_index(drop=True, level=1)
.rename('abb')
print (s)
0 I
1 R
2 Q
3 I
3 R
3 Q
4 I
4 Q
Name: abb, dtype: object
df1 = df1.drop('abb', axis=1).join(s)
print (df1)
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 I
3 foofoo 337144 R
3 foofoo 337144 Q
4 barbar 306521 I
4 barbar 306521 Q
Upvotes: 2