Reputation: 7649
I've input data in the following format, not decided by me
key1: value1 key2: value2 key3: value3 key4 { key11: val11 key22: value22 } key5: value5 ............
The input string will have key values separated by colon or a brace bracket.
I want to tokenize it and I have the following idea:
First to have a regular expression parsing data till I find a :
or {
with priority to {
over :
Then split and read till the white space pattern that I said is reached and recursively traverse the whole string
I want to know if I can write a regex like (some_string)(special character pattern)
(special character pattern can be :
or {
with precedence to {
)(rest of the string)
If it is a :
then for rest of the string, get the string part from ' value1 ' and capture it. Work on the remaining string
If it is a {
then traverse till you find }
and internally work with :
logic defined above.
For eg
a: 1 b: 2 c { d: 3 e: 4 } f: 5
This should give
a:1
b:2
c { d: 3 e: 4 }
f: 5
Upvotes: 1
Views: 810
Reputation: 89557
You can use this pattern:
[^ ]+(?:: [^ ]+| \{[^}]+\})
example:
import re
test = "a: 1 b: 2 c { d: 3 e: 4 } f: 5"
pattern = re.compile(r"[^ ]+(?:: [^ ]+| \{[^}]+\})")
for match in pattern.findall(test):
print match
Upvotes: 4