Reputation: 2031
Instead of defining documents
like this ...
documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]
... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.
I have come up with this code:
# read txt documents
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
file_content = open(file, "r")
lines = file_content.read().splitlines()
for line in lines:
documents.append(line)
But the documents
resulting from the two strategies seem to be in different format. I want the second strategy to produce the same output as the first.
Upvotes: 1
Views: 1192
Reputation: 226221
... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.
Translating the requirements directly gives:
with open('somefile1.txt') as f1:
lines_file1 = f1.readlines()
with open('somefile2.txt') as f2:
lines_file2 = f2.readlines()
documents = lines_file1[0:1] + lines_file2[1:3]
FWIW, given the kind of work you're doing, the [fileinput module][1]
may be helpful.
Hope this get you back in business :-)
Upvotes: 0
Reputation: 191701
If I understand your code correctly, this is equivalent and more performant (no reading the entire file into a string, then splitting to a list).
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
documents.extend( line for line in open(file) )
Or maybe even one line.
documents = [ line for line in open(file) for file in glob.glob("*.txt") ]
Upvotes: 1
Reputation: 170
Instead of .read().splitlines()
, you can use .readlines()
. This will place every file's contents into a list.
Upvotes: 0