Mi.
Mi.

Reputation: 510

How to get all unique values in a new column

I have a pandas dataframe as shown below.

DF_Old =

id    tID       word   rA     rB
12    1         A      N      N
12    2         B      O      N
12    2         B      H      O
12    2         B      O      O
12    2         B      N      H

Every word in word with the same tID is the same word with different values in the columns rA and rB. I would like to create a new column R that includes all unique values in the columns rA and rB as shown below.

DF_New =

id    tID       word     R
12    1         A        N
12    2         B        O,N,H

How can I do that?

Upvotes: 2

Views: 235

Answers (1)

jezrael
jezrael

Reputation: 862511

Use groupby and each group flatten by numpy.ravel, get unique values by /numpy.unique and last if necessary join:

df = (df.groupby(['id','tID','word'])
       .apply(lambda x: ', '.join(np.unique(x.values.ravel())))
       .reset_index(name='R'))
print (df)
   id  tID word        R
0  12    1    A        N
1  12    2    B  H, N, O

Another solution with melt and drop_duplicates, last only join values:

df = df.melt(['id','tID','word'], value_name='R').drop_duplicates(['id','tID','word', 'R'])
df = df.groupby(['id','tID','word'])['R'].apply(', '.join).reset_index()
print (df)
   id  tID word        R
0  12    1    A        N
1  12    2    B  O, H, N

Upvotes: 3

Related Questions