everythingfunctional
everythingfunctional

Reputation: 871

Python equivalent of Fortran list-directed input

I'd like to be able to read data from an input file in Python, similar to the way that Fortran handles a list-directed read (i.e. read (file, *) char_var, float_var, int_var).

The tricky part is that the way Fortran handles a read statement like this is very "forgiving" as far as the input format is concerned. For example, using the previous statement, this:

"some string" 10.0, 5

would be read the same as:

"some string",      10.0
5

and this:

"other string", 15.0 /

is read the same as:

"other string"
15
/

with the value of int_var retaining the same value as before the read statement. And trickier still this:

"nother string", , 7

will assign the values to char_var and int_var but float_var retains the same value as before the read statement.

Is there an elegant way to implement this?

Upvotes: 3

Views: 530

Answers (2)

everythingfunctional
everythingfunctional

Reputation: 871

Since I was not able to find a solution to this problem, I decided to write my own solution.

The main drivers are a reader class, and a tokenizer. The reader gets one line at a time from the file, passes it to the tokenizer, and assigns to the variables it is given, getting the next line as necessary.

class FortranAsciiReader(file):

def read(self, *args):
    """
    Read from file into the given objects
    """
    num_args = len(args)
    num_read = 0
    encountered_slash = False
    # If line contained '/' or read into all varialbes, we're done
    while num_read < num_args and not encountered_slash:
        line = self.readline()
        if not line:
            raise Exception()
        values = tokenize(line)
        # Assign elements one-by-one into args, skipping empty fields and stopping at a '/'
        for val in values:
            if val == '/':
                encountered_slash = True
                break
            elif val == '':
                num_read += 1
            else:
                args[num_read].assign(val)
                num_read += 1
                if num_read == num_args:
                    break

The tokenizer splits the line into tokens in accordance with the way that Fortran performs list directed reads, where ',' and white space are separators, tokens may be "repeated" via 4*token, and a / terminates input.

My implementation of the tokenizer is a bit long to reproduce here, and I also included classes to transparently provide the functionality of the basic Fortran intrinsic types (i.e. Real, Character, Integer, etc.). The whole project can be found on my github account, currently at https://github.com/bprichar/PyLiDiRe. Thanks jsbueno for inspiration for the tokenizer.

Upvotes: 1

jsbueno
jsbueno

Reputation: 110261

That is indeed tricky - I found it easier to write a pure-python stated-based tokenizer than think on a regular expression to parse each line (tough it is possible).

I've used the link provided by Vladimir as the spec - the tokenizer have some doctests that pass.

def tokenize(line, separator=',', whitespace="\t\n\x20", quote='"'):
    """
    >>> tokenize('"some string" 10.0, 5')
    ['some string', '10.0', '5']

    >>> tokenize(' "other string", 15.0 /')
    ['other string', '15.0', '/']

    >>> tokenize('"nother string", , 7')
    ['nother string', '', '7']

    """
    inside_str = False
    token_started = False
    token = ""
    tokens = []
    separated = False
    just_added = False
    for char in line:
        if char in quote:
            if not inside_str:
                inside_str = True

            else:
                inside_str = False
                tokens.append(token)
                token = ""
                just_added = True
            continue
        if char in (whitespace + separator) and not inside_str:
            if token:
                tokens.append(token)
                token = ""
                just_added = True
            elif char in separator:
                if not just_added:
                    tokens.append("")
                just_added = False
            continue
        token += char
    if token:
        tokens.append(token)
    return tokens


class Character(object):
    def __init__(self, length=None):
        self.length = length
    def __call__(self, text):
        if self.length is None:
            return text
        if len(text) > self.length:
            return text[:self.length]
        return "{{:{}}}".format(self.length).format(text)


def make_types(types, default_value):
    return types, [default_value] * len[types]


def fortran_reader(file, types, default_char="/", default_value=None, **kw):
    types, results = make_types(types, default_value)
    tokens = []
    while True:
        tokens = []
        while len(tokens) < len(results):
            try:
                line = next(file)
            except StopIteration:
                raise StopIteration
            tokens += tokenize(line, **kw)
        for i, (type_, token) in enumerate(zip(types, tokens)):
            if not token or token in default_char:
                continue
            results[i] = type_(token)
        changed_types = yield(results)
        if changed_types:
            types, results = make_types(changed_types)

I have not teste this thoughtfully - but for the tokenizer - it is designed to work in a Python forstatement if the same fields are repeated over and over again - or it can be used with Python's iterators send method to change the values to be read on each iteration.

Please test, and e-mail me (address at my profile) some testing file. If there is indeed nothing similar, maybe this deserves some polishing and be published in Pypi.

Upvotes: 3

Related Questions