pandas combine a data frame with another groupby dataframe

Question

I have two data frames with structure as given below.

>>> df1
   IID   NAME   TEXT
0   10    One  AA,AB
1   11    Two  AB,AC
2   12  Three     AB
3   13   Four     AC
>>> df2
   IID TEXT
0   10   aa
1   10   ab
2   11  abc
3   11  a,c
4   11   ab
5   12   AA
6   13   AC
7   13   ad
8   13  abc

I want them to combine such that new data frame is a copy of df1 with the TEXT field appearing in df2 for the corresponding IID is appended to the TEXT field of df1 with duplicates removed (cases insensitive duplication check).

My expected output is

>>> df1
   IID   NAME           TEXT
0   10    One          AA,AB
1   11    Two  AB,AC,ABC,A,C
2   12  Three          AB,AA
3   13   Four      AC,AD,ABC

I tried with groupby on df2, but how can I do the joint of the groupie object to a dataframe ?

anky · Accepted Answer

I believe you need concat with groupby.agg to create the skeleton with duplicates , then series.explode with groupby+unique for de-duplicating

out = (pd.concat((df1,df2),sort=False).groupby('IID')
      .agg({'NAME':'first','TEXT':','.join}).reset_index())
out['TEXT'] = (out['TEXT'].str.upper().str.split(',').explode()
              .groupby(level=0).unique().str.join(','))
print(out)

   IID   NAME           TEXT
0   10    One          AA,AB
1   11    Two  AB,AC,ABC,A,C
2   12  Three          AB,AA
3   13   Four      AC,AD,ABC

pandas combine a data frame with another groupby dataframe

Answers (2)

OR

Related Questions