programming freak
programming freak

Reputation: 899

Using NLTK to tokeniz sentences to words using pandas

I'm trying to tokenize sentences from a csv file into words but my loop is not jumping to the next sentence its just doing first column. any idea where is the mistake ? this is how my CSV file look like enter image description here

import re
import string
import pandas as pd
text=pd.read_csv("data.csv")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []

the output I'm getting is like this

enter image description here

which I expected to do for all sentences as a loop not just one.

Upvotes: 2

Views: 51

Answers (1)

oppressionslayer
oppressionslayer

Reputation: 7224

You just need to change the code to grab the sentences:

import re
import string
import pandas as pd
text=pd.read_csv("out157.txt", sep="|")
from nltk.tokenize import word_tokenize
tokenized_docs=[word_tokenize(doc) for doc in text['SENTENCES']]
x=re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []

Upvotes: 3

Related Questions