Simon Lindgren
Simon Lindgren

Reputation: 2031

Reading text files into lists in Python

Instead of defining documentslike this ...

documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

I have come up with this code:

# read txt documents
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    file_content = open(file, "r")
    lines = file_content.read().splitlines()
    for line in lines:
        documents.append(line)

But the documents resulting from the two strategies seem to be in different format. I want the second strategy to produce the same output as the first.

Upvotes: 1

Views: 1192

Answers (3)

Raymond Hettinger
Raymond Hettinger

Reputation: 226221

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

Translating the requirements directly gives:

with open('somefile1.txt') as f1:
    lines_file1 = f1.readlines()
with open('somefile2.txt') as f2:
    lines_file2 = f2.readlines()
documents = lines_file1[0:1] + lines_file2[1:3]

FWIW, given the kind of work you're doing, the [fileinput module][1] may be helpful.

Hope this get you back in business :-)

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 191701

If I understand your code correctly, this is equivalent and more performant (no reading the entire file into a string, then splitting to a list).

os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    documents.extend( line for line in open(file) )

Or maybe even one line.

documents = [ line for line in open(file) for file in glob.glob("*.txt") ]

Upvotes: 1

K.Land_bioinfo
K.Land_bioinfo

Reputation: 170

Instead of .read().splitlines(), you can use .readlines(). This will place every file's contents into a list.

Upvotes: 0

Related Questions