Mike
Mike

Reputation: 201

How separate individual sentences using nltk?

Programmming Noob, trying to use sent_tokenize to split text into separate sentences. While it appears to be working (in console, making each sentence it's own list item), when I append it to an empty list, I end up with a list (well, a list of list of lists from the syntax) of len 1, that I cannot iterate through. Basically, I want to be able to extract each individual sentence, so that I can compare same with something i.e., i.e. the string "Summer is great." There may be a better way to accomplish this, but please try to give me a simple solution, because Noob. I imagine there is a flag at the end of every sentence I could use to append sentences one at a time, so pointing me to that might be enough.

I've reviewed the documentation and tried adding the following code, but still end up with my listz being of length 1, rather than broken into individual sentences.

import nltk nltk.download('punkt')

from nltk import sent_tokenize, word_tokenize

listz = []

s = "Good muffins cost $3.88\nin New York.  Please buy me two of 
them.\n\nThanks."

listz.append([word_tokenize(t) for t in sent_tokenize(s)])

print(listz)

---
// Expenced output listz = [["Good muffins cost $3.88 in New York."], 
["Please buy me two of them."], ["Thanks."]]

Upvotes: 0

Views: 123

Answers (1)

iz_
iz_

Reputation: 16593

You should use extend:

listz.extend([word_tokenize(t) for t in sent_tokenize(s)])

But in this case, simple assignment works:

listz = [word_tokenize(t) for t in sent_tokenize(s)]

Upvotes: 1

Related Questions