Jeff
Jeff

Reputation: 99

How to read values one whitespace separated value at a time?

In C++ you can read one value at a time like this:

//from console
cin >> x;

//from file:
ifstream fin("file name");
fin >> x;

I would like to emulate this behaviour in Python. It seems, however, that the ordinary ways to get input in Python read either whole lines, the whole file, or a set number of bits.

I would like a function, let's call it one_read(), that reads from a file until it encounters either a white-space or a newline character, then stops. Also, on subsequent calls to one_read() the input should begin where it left off. Examples of how it should work:

# file input.in is:
# 5 4
# 1 2 3 4 5
n = int(one_read())
k = int(one_read())
a = []
for i in range(n):
    a.append(int(one_read()))
# n = 5 , k = 4 , a = [1,2,3,4,5]

How can I do this?

Upvotes: 0

Views: 509

Answers (4)

match
match

Reputation: 11070

Normally you would just read a line at a time, then split this and work with each part. However if you can't do this for resource reasons, you can implement your own reader which will read one character at a time, and then yield a word each time it reaches a delimiter (or in this example also a newline or the end of the file).

This implemention uses a context manager to handle the file opening/reading, though this might be overkill:

from functools import partial

class Words():
    def __init__(self, fname, delim):
        self.delims = ['\n', delim]
        self.fname = fname
        self.fh = None

    def __enter__(self):
        self.fh = open(self.fname)
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.fh.close()

    def one_read(self):
        chars = []
        for char in iter(partial(self.fh.read, 1), ''):
           if char in self.delims:
               # delimiter signifies end of word 
               word = ''.join(chars)
               chars = []
               yield word
           else:
               chars.append(char)

# Assuming x.txt contains 12 34 567 8910
with Words('/tmp/x.txt', ' ') as w:
    print(next(w.one_read()))
    # 12
    print(next(w.one_read()))
    # 34 
    print(list(w.one_read()))
    # [567, 8910]

Upvotes: 1

philosofool
philosofool

Reputation: 973

I think the following should get you close. I admit I haven't tested the code carefully. It sounds like itertools.takewhile should be your friend, and a generator like yield_characters below will be useful.

from itertools import takewhile
import regex as re

# this function yields characters from a file one a at a time.
def yield_characters(file):
    with open(file, 'r') as f:
       while f:
           line = f.readline()
           for char in line:
              yield char

# double check this. My python regex is weak.
def not_whitespace(char):
    return bool(re.match(r"\S", char))

# this should use takewhile to get iterators while something is 
def read_one(file):
    chars = yield_character(file)
    while chars:
        yield list(takewhile(not_whitespace, chars)).join()

The read_one above is a generator, so you will need to do something like call list on it.

Upvotes: 1

RossM
RossM

Reputation: 438

Try creating a class to remember where the operation left off.

The __init__ function takes the filename, you could modify this to take a list or other iterable.

read_one checks if there is anything left to read, and if there is, removes and returns the item at index 0 in the list; that being everything until the first whitespace.

class Reader:
    def __init__(self, filename):
        self.file_contents = open(filename).read().split()

    def read_one(self):
        if self.file_contents != []:
            return self.file_contents.pop(0)

Initalise the function as follows and adapt to your liking:

reader = Reader(filepath)
reader.read_one()

Upvotes: -1

Karl Knechtel
Karl Knechtel

Reputation: 61643

More or less anything that operates on files in Python can operate on the standard input and standard output. The sys standard library module defines stdin and stdout which give you access to those streams as file-like objects.

Reading a line at a time is considered idiomatic in Python because the other way is quite error-prone (just one C++ example question on Stack Overflow). But if you insist: you will have to build it yourself.

As you've found, .read(n) will read at most n text characters (technically, Unicode code points) from a stream opened in text mode. You can't tell where the end of the word is until you read the whitespace, but you can .seek back one spot - though not on the standard input, which isn't seekable.

You should also be aware that the built-in input will ignore any existing data on the standard input before prompting the user:

>>> sys.stdin.read(1) # blocks
foo
'f'
>>> # the `foo` is our input, the `'f'` is the result
>>> sys.stdin.read(1) # data is available; doesn't block
'o'
>>> input()
bar
'bar'
>>> # the second `o` from the first input was lost

Upvotes: 0

Related Questions