Roman Kazmin
Roman Kazmin

Reputation: 981

Tokenize dataframe column and create new dataframe for result

I have the following dataframe

pd.DataFrame({'category': [1,2,1], 'names' : ['ab c', 's', 'dm ab aaa']})

category   names
0   1      ab c
1   2      s
2   1      dm ab aaa

Really I need to find all unique tokens(separated by space) in names column, assign corresponding category and create new datafrane as you can see below:

pd.DataFrame({'category' : [1, 1,2,1,1,1], 'names' : ['ab', 'c', 's', 'dm', 'ab', 'aaa']})

category   names
0   1      ab
1   1      c
2   2      s
3   1      dm
4   1      ab
5   1      aaa

Please help me and how to do it the best way...

Upvotes: 1

Views: 103

Answers (1)

akuiper
akuiper

Reputation: 214957

You can split the names column first and then explode it:

df.assign(names = df.names.str.split()).explode('names')

#   category names
#0         1    ab
#0         1     c
#1         2     s
#2         1    dm
#2         1    ab
#2         1   aaa

If you need to reset index (from @KRKirov's comment):

df.assign(names = df.names.str.split()).explode('names').reset_index(drop=True)

Upvotes: 1

Related Questions