Bin
Bin

Reputation: 1000

python regex to construct a structured data structure

I have some data which looks like:

key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4
...
...

Based on it, what I want is to construct a data structure like:

{'abc':[1,2,3]}
{'bcd':[2,3,4]}
...

Is regular expression a good choice to do that? If so, how to write the regular expression so that the process behaves like a for loop (inside the loop, I can do some job to construct a data structure with the data I got) ?

Thanks.

Upvotes: 0

Views: 301

Answers (3)

Akshat Harit
Akshat Harit

Reputation: 824

The following code should work if the data is always in that format.

str=""
with open(FILENAME, "r") as f:
    str =f.read()
regex = r'key ([^\s]*) key\nvalue (\d)+\nvalue (\d)+\nvalue (\d+)'
matches=re.findall(regex, str)
dic={}
for match in matches:
    dic[match[0]] = map(int, match[1:])
print dic

EDIT: The other answer by meelo is more robust as it handles cases where values might be more or less than 3.

Upvotes: 0

vks
vks

Reputation: 67978

x="""key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4"""
j= re.findall(r"key (.*?) key\n([\s\S]*?)(?=\nkey|$)",x)
d={}
for i in j:
    k=map(int,re.findall(r"value (.*?)(?=\nvalue|$)",i[1]))
    d[i[0]]=k
print d

Upvotes: 0

meelo
meelo

Reputation: 582

Using regular expression can be more robost relative to using string slicing to identify values in text file. If you have confidence in the format of your data, using string slicing will be fine.

import re

keyPat = re.compile(r'key (\w+) key')
valuePat = re.compile(r'value (\d+)')

result = {}
for line in open('data.txt'):
    if keyPat.search(line):
        match = keyPat.search(line).group(1)
        tempL = []
        result[match] = tempL
    elif valuePat.search(line):
        match = valuePat.search(line).group(1)
        tempL.append(int(match))
    else:
        print('Did not match:', line)

print(result)

Upvotes: 2

Related Questions