Python: How to I read from stdin/file word by word?

Question

As the title says, how do I read from stdin or from a file word by word, rather than line by line? I'm dealing with very large files, not guaranteed to have any newlines, so I'd rather not load all of a file into memory. So the standard solution of:

for line in sys.stdin:
    for word in line:
        foo(word)

won't work, since line may be too large. Even if it's not too large, it's still inefficient since I don't need the entire line at once. I essentially just need to look at a single word at a time, and then forget it and move on to the next one, until EOF.

Peatherfed · Accepted Answer

Here's a straightforward answer:

word = ''
with open('filename', 'r') as f:
    while (c := f.read(1)):
        if c.isspace():
            if word:
                print(word) # Here you can do whatever you want e.g. append to list
            word = ''
        else:
            word += c

I will note that it would be faster to read larger byte-chunks at a time, and detecting words after the fact. Ben Y's answer has an (as of this edit) incomplete solution that might be of assistance. If performance (rather than memory, as was my issue) is a problem, that should probably be your approach. The code will be quite a bit longer, however.

Python: How to I read from stdin/file word by word?

Answers (2)

Related Questions