Reputation: 899
I'm trying to tokenize sentences from a csv file into words but my loop is not jumping to the next sentence its just doing first column. any idea where is the mistake ?
this is how my CSV file look like
import re
import string
import pandas as pd
text=pd.read_csv("data.csv")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []
the output I'm getting is like this
which I expected to do for all sentences as a loop not just one.
Upvotes: 2
Views: 51
Reputation: 7224
You just need to change the code to grab the sentences:
import re
import string
import pandas as pd
text=pd.read_csv("out157.txt", sep="|")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text['SENTENCES']]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []
Upvotes: 3