Reputation: 510
I have a pandas dataframe as shown below.
DF_Old =
id tID word rA rB
12 1 A N N
12 2 B O N
12 2 B H O
12 2 B O O
12 2 B N H
Every word in word
with the same tID
is the same word with different values in the columns rA
and rB
. I would like to create a new column R
that includes all unique values in the columns rA
and rB
as shown below.
DF_New =
id tID word R
12 1 A N
12 2 B O,N,H
How can I do that?
Upvotes: 2
Views: 235
Reputation: 862511
Use groupby
and each group flatten by numpy.ravel
, get unique values by /numpy.unique
and last if necessary join
:
df = (df.groupby(['id','tID','word'])
.apply(lambda x: ', '.join(np.unique(x.values.ravel())))
.reset_index(name='R'))
print (df)
id tID word R
0 12 1 A N
1 12 2 B H, N, O
Another solution with melt
and drop_duplicates
, last only join
values:
df = df.melt(['id','tID','word'], value_name='R').drop_duplicates(['id','tID','word', 'R'])
df = df.groupby(['id','tID','word'])['R'].apply(', '.join).reset_index()
print (df)
id tID word R
0 12 1 A N
1 12 2 B O, H, N
Upvotes: 3