Using NLTK to tokeniz sentences to words using pandas

Question

I'm trying to tokenize sentences from a csv file into words but my loop is not jumping to the next sentence its just doing first column. any idea where is the mistake ? this is how my CSV file look like

import re
import string
import pandas as pd
text=pd.read_csv("data.csv")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []

the output I'm getting is like this

which I expected to do for all sentences as a loop not just one.

oppressionslayer · Accepted Answer

You just need to change the code to grab the sentences:

import re
import string
import pandas as pd
text=pd.read_csv("out157.txt", sep="|")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text['SENTENCES']]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []

Using NLTK to tokeniz sentences to words using pandas

Answers (1)

Related Questions