put sentences into list - python

Question

I understand that nltk can split sentences and print it out using the following code. but how do i put the sentences into a list instead of outputing onto the screen?

import nltk.data
from nltk.tokenize import sent_tokenize
import os, sys, re, glob
cwd = './extract_en' #os.getcwd()
for infile in glob.glob(os.path.join(cwd, 'fileX.txt')):
    (PATH, FILENAME) = os.path.split(infile)
    read = open(infile)
    for line in read:
        sent_tokenize(line)

the sent_tokenize(line) prints it out. how do i put it into a list?

senderle · Accepted Answer

Here's a simplified version that I used to test the code:

import nltk.data
from nltk.tokenize import sent_tokenize
import sys
infile = open(sys.argv[1])
slist = []
for line in infile:
    slist.append(sent_tokenize(line))
print slist
infile.close()

When called like so, it prints the following:

me@mine:~/src/ $ python nltkplay.py nltkplay.py 
[['import nltk.data
'], ['from nltk.tokenize import sent_tokenize
'], ['import sys
'], ['infile = open(sys.argv[1])
'], ['slist = []
'], ['for line in infile:
'], ['    slist.append(sent_tokenize(line))
'], ['print slist
'], ['
']]

When doing something like this, a list comprehension is more concise and IMO more pleasant to read:

slist = [sent_tokenize(line) for line in infile]

To clarify, the above returns a list of lists of sentences, one list of sentences for each line. If you want a flat list of sentences, do this instead, as eyquem suggests:

slist = sent_tokenize(infile.read())

put sentences into list - python

Answers (2)

Related Questions