Reputation: 1000
I have some data which looks like:
key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4
...
...
Based on it, what I want is to construct a data structure like:
{'abc':[1,2,3]}
{'bcd':[2,3,4]}
...
Is regular expression a good choice to do that? If so, how to write the regular expression so that the process behaves like a for
loop (inside the loop, I can do some job to construct a data structure with the data I got) ?
Thanks.
Upvotes: 0
Views: 301
Reputation: 824
The following code should work if the data is always in that format.
str=""
with open(FILENAME, "r") as f:
str =f.read()
regex = r'key ([^\s]*) key\nvalue (\d)+\nvalue (\d)+\nvalue (\d+)'
matches=re.findall(regex, str)
dic={}
for match in matches:
dic[match[0]] = map(int, match[1:])
print dic
EDIT: The other answer by meelo is more robust as it handles cases where values might be more or less than 3.
Upvotes: 0
Reputation: 67978
x="""key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4"""
j= re.findall(r"key (.*?) key\n([\s\S]*?)(?=\nkey|$)",x)
d={}
for i in j:
k=map(int,re.findall(r"value (.*?)(?=\nvalue|$)",i[1]))
d[i[0]]=k
print d
Upvotes: 0
Reputation: 582
Using regular expression can be more robost relative to using string slicing to identify values in text file. If you have confidence in the format of your data, using string slicing will be fine.
import re
keyPat = re.compile(r'key (\w+) key')
valuePat = re.compile(r'value (\d+)')
result = {}
for line in open('data.txt'):
if keyPat.search(line):
match = keyPat.search(line).group(1)
tempL = []
result[match] = tempL
elif valuePat.search(line):
match = valuePat.search(line).group(1)
tempL.append(int(match))
else:
print('Did not match:', line)
print(result)
Upvotes: 2