mzn.rft
mzn.rft

Reputation: 839

returning a list of words after reading a file in python

I have a text file which is named test.txt. I want to read it and return a list of all words (with newlines removed) from the file.

This is my current code:

def read_words(test.txt):
    open_file = open(words_file, 'r')
    words_list =[]
    contents = open_file.readlines()
    for i in range(len(contents)):
         words_list.append(contents[i].strip('\n'))
    return words_list    
    open_file.close()  

Running this code produces this list:

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']

I want the list to look like this:

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']

Upvotes: 7

Views: 40907

Answers (5)

Shivam Shah
Shivam Shah

Reputation: 41

The actual question has already been answered, but I would like to point out that the line f.close() will not be executed as the function returns before that line. Try writing f.close() before the return statement.

Upvotes: 0

inspectorG4dget
inspectorG4dget

Reputation: 113915

There are several ways to do this. Here are a few:

If you don't care about repeated words:

def getWords(filepath):
    with open('filepath') as f:
        return list(itertools.chain(line.split() for line in f))

If you want to return a list of words in which each word appears only once:

Note: this does not preserve the order of the words

def getWords(filepath):
    with open('filepath') as f:
        return {word for word in line.split() for line in f} # python2.7
        return set((word for word in line.split() for line in f)) # python 2.6

If you want a set --and-- want to preserve the order of words:

def getWords(filepath):
    with open('filepath') as f:
        words = []
        pos = {}
        position = itertools.count()
        for line in f:
            for word in line.split():
                if word not in pos:
                    pos[word] = position.next()
                        words.append(word)
    return sorted(words, key=pos.__getitem__)

If you want a word-frequency dictionary:

def getWords(filepath):
    with open('filepath') as f:
        return collections.Counter(itertools.chain(line.split() for line in file))

Hope these help

Upvotes: 3

mgilson
mgilson

Reputation: 309841

Depending on the size of the file, this seems like it would be as easy as:

with open(file) as f:
    words = f.read().split()

Upvotes: 20

Andrew Clark
Andrew Clark

Reputation: 208435

Replace the words_list.append(...) line in the for loop with the following:

words_list.extend(contents[i].split())

This will split each line on whitespace characters, and then add each element of the resulting list to words_list.

Or as an alternative method for rewriting the entire function as a list comprehension:

def read_words(words_file):
    return [word for line in open(words_file, 'r') for word in line.split()]

Upvotes: 14

NPE
NPE

Reputation: 500257

Here is how I'd write that:

def read_words(words_file):
  with open(words_file, 'r') as f:
    ret = []
    for line in f:
      ret += line.split()
    return ret

print read_words('test.txt')

The function can be somewhat shortened by using itertools, but I personally find the result less readable:

import itertools

def read_words(words_file):
  with open(words_file, 'r') as f:
    return list(itertools.chain.from_iterable(line.split() for line in f))

print read_words('test.txt')

The nice thing about the second version is that it can be made to be entirely generator-based and thus avoid keeping all of the file's words in memory at once.

Upvotes: 5

Related Questions