alvas
alvas

Reputation: 122042

How to parse a string with key-value pairs separated by spaces?

Given a string as such:

LexicalReordering0= -1.88359 0 -1.6864 -2.34184 -3.29584 0 Distortion0= -4 LM0= -85.3898 WordPenalty0= -13 PhrasePenalty0= 11 TranslationModel0= -6.79761 -3.06898 -8.90342 -4.35544

It contains the key of the desired dictionary that ends with = and until the next key, the rest of the values separated by spaces are the values of the current key.

Do note that the name of the keys are not know before parsing the input string

The resulting dictionary should look like this:

{'PhrasePenalty0=': [11.0], 'Distortion0=': [-4.0], 'TranslationModel0=': [-6.79761, -3.06898, -8.90342, -4.35544], 'LM0=': [-85.3898], 'WordPenalty0=': [-13.0], 'LexicalReordering0=': [-1.88359, 0.0, -1.6864, -2.34184, -3.29584, 0.0]}

I could do so with this loop:

>>> textin ="LexicalReordering0= -1.88359 0 -1.6864 -2.34184 -3.29584 0 Distortion0= -4 LM0= -85.3898 WordPenalty0= -13 PhrasePenalty0= 11 TranslationModel0= -6.79761 -3.06898 -8.90342 -4.35544"
>>> thiskey = ""
>>> thismap = {}
>>> for element in textin.split():
...     if element[-1] == '=':
...             thiskey = element
...             thismap[thiskey] = []
...     else:
...             thismap[thiskey].append(float(element))
... 
>>> map
{'PhrasePenalty0=': [11.0], 'Distortion0=': [-4.0], 'TranslationModel0=': [-6.79761, -3.06898, -8.90342, -4.35544], 'LM0=': [-85.3898], 'WordPenalty0=': [-13.0], 'LexicalReordering0=': [-1.88359, 0.0, -1.6864, -2.34184, -3.29584, 0.0]}

But is there another way to achieve the same dictionary from the input string? (maybe regex or some pythonic parser library?).

Upvotes: 2

Views: 4371

Answers (2)

rici
rici

Reputation: 241701

Here's a way to do it using the regular expression library. I don't know if it is more efficient, or even if it could be described as pythonic:

pat = re.compile(r'''([^\s=]+)=\s*((?:[^\s=]+(?:\s|$))*)''')

# The values are lists of strings
entries = dict((k, v.split()) for k, v in pat.findall(textin))

# Alternative if you want the values to be floating point numbers
entries = dict((k, list(map(float, v.split())))
               for k, v in pat.findall(textin))

In Python 2.x, you can use map(float, v.split()) instead of list(map(float, v.split))).

Unlike the original program, this one allows inputs where there is no whitespace between the = and the first value. Also, any items in the input before the first instance of key= are silently ignored. It might be better to explicitly recognize them and throw an error.

Explanation of the pattern:

([^\s=]+)                            A key (any non-whitespace except =)
         =\s*                        followed by = and possible whitespace
             ((?:[^\s=]+(?:\s|$))*)  Any number of repetitions of a string
                                     without = followed by either whitespace
                                     or the end of the input

Upvotes: 3

gamda
gamda

Reputation: 580

Since your input string is separated by spaces and you either have keys or values, you can use split() and then loop through the elements and assign them.

entries = textin.split()
key = ""
for x in entries:
    try:
        x = float(x)
        answer[key].append(x)
    except ValueError:
        key = x[:-1] # ignore last char '='
        answer[key] = []

I am assuming that the first element of your string will always be a key, so answer[key] will never get called when key is an empty string.

Upvotes: 0

Related Questions