The Puma
The Puma

Reputation: 1390

How do I take advantage of Python generators when reading in a huge file and parsing by word?

Here is the relevant code I have. It is using a generator to get the words from the file. However, the words are first stored into a variable before entering a function. Is this correct?

Does this take advantage of the generator functionality?

def do_something(words):
    new_list = {}
    for word in words:
        // do stuff to each word
        // then add to new_list
    return new_list

def generate_words(input_file):
    for line in input_file:
        for word in line.split(' '):
            // do stuff to word
            yield word

if __name__ == '__main__':
    with open("in.txt") as input_file:
        words = generate_words(input_file)
        do_something(words)

Thank you

Upvotes: 1

Views: 120

Answers (3)

Eduard Iskandarov
Eduard Iskandarov

Reputation: 862

There is no advantage of using generators in given example. The main purpose is reduce memory usage.

In the code:

for line in input_file:

line already beed read from file and consumed memory. Then split operation create new list and memory been consumed one more time.

So all you have to do is iterate through list items.

While usage of generators will lead creating generator object which yield objects from existing list. It is completely useless.

Upvotes: -1

The code looks fine. What is being stored in words is a fresh generator prepared to run the code in generate_words; the code will only actually run when the for word in words: is triggered. If you want to know more, this SO question has a whole heap of information.

Upvotes: 2

jamylak
jamylak

Reputation: 133574

When you make words = generate_words(input_file), you are simply giving it a reference to the newly created generator. When you run do_something, that's when the generator is actually iterated through, words is just a reference to it. So the answer is yes, you are taking advantage of generators.

Upvotes: 4

Related Questions