Reputation: 839
I have a text file which is named test.txt
. I want to read it and return a list of all words (with newlines removed) from the file.
This is my current code:
def read_words(test.txt):
open_file = open(words_file, 'r')
words_list =[]
contents = open_file.readlines()
for i in range(len(contents)):
words_list.append(contents[i].strip('\n'))
return words_list
open_file.close()
Running this code produces this list:
['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']
I want the list to look like this:
['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']
Upvotes: 7
Views: 40907
Reputation: 41
The actual question has already been answered, but I would like to point out that the line f.close() will not be executed as the function returns before that line. Try writing f.close() before the return statement.
Upvotes: 0
Reputation: 113915
There are several ways to do this. Here are a few:
If you don't care about repeated words:
def getWords(filepath):
with open('filepath') as f:
return list(itertools.chain(line.split() for line in f))
If you want to return a list of words in which each word appears only once:
Note: this does not preserve the order of the words
def getWords(filepath):
with open('filepath') as f:
return {word for word in line.split() for line in f} # python2.7
return set((word for word in line.split() for line in f)) # python 2.6
If you want a set --and-- want to preserve the order of words:
def getWords(filepath):
with open('filepath') as f:
words = []
pos = {}
position = itertools.count()
for line in f:
for word in line.split():
if word not in pos:
pos[word] = position.next()
words.append(word)
return sorted(words, key=pos.__getitem__)
If you want a word-frequency dictionary:
def getWords(filepath):
with open('filepath') as f:
return collections.Counter(itertools.chain(line.split() for line in file))
Hope these help
Upvotes: 3
Reputation: 309841
Depending on the size of the file, this seems like it would be as easy as:
with open(file) as f:
words = f.read().split()
Upvotes: 20
Reputation: 208435
Replace the words_list.append(...)
line in the for loop with the following:
words_list.extend(contents[i].split())
This will split each line on whitespace characters, and then add each element of the resulting list to words_list
.
Or as an alternative method for rewriting the entire function as a list comprehension:
def read_words(words_file):
return [word for line in open(words_file, 'r') for word in line.split()]
Upvotes: 14
Reputation: 500257
Here is how I'd write that:
def read_words(words_file):
with open(words_file, 'r') as f:
ret = []
for line in f:
ret += line.split()
return ret
print read_words('test.txt')
The function can be somewhat shortened by using itertools
, but I personally find the result less readable:
import itertools
def read_words(words_file):
with open(words_file, 'r') as f:
return list(itertools.chain.from_iterable(line.split() for line in f))
print read_words('test.txt')
The nice thing about the second version is that it can be made to be entirely generator-based and thus avoid keeping all of the file's words in memory at once.
Upvotes: 5