Jiayu Zhang
Jiayu Zhang

Reputation: 719

Tokenize multiple sentences to rows in python pandas

I have a text dataframe like this,

id      text
1       Thanks.  I appreciate your help.  I really like this chat service as it is very convenient.  I hope you have a wonderful day! thanks!
2       Got it. Thanks for the help; good nite.

I want to split those text sentences and match them to each id. My expected output is,

id      text
1       Thanks.
1       I appreciate your help.
1       I really like this chat service as it is very convenient.
1       I hope you have a wonderful day!
1       thanks!
2       Got it.
2       Thanks for the help;
2       good nite.

Is there any nltk functions that can handle this problem?

Upvotes: 2

Views: 399

Answers (1)

BENY
BENY

Reputation: 323226

1st split then use explode , if you are not upgrade your pandas to 0.25 , check How to unnest (explode) a column in a pandas DataFrame?

df.assign(text=df.text.str.split('[.!;]')).explode('text').loc[lambda x : x.text!='']
Out[181]: 
                                                text  id
0                                             Thanks   1
0                             I appreciate your help   1
0    I really like this chat service as it is ver...   1
0                    I hope you have a wonderful day   1
0                                             thanks   1
1                                             Got it   2
1                                Thanks for the help   2
1                                          good nite   2

Upvotes: 6

Related Questions