Reputation: 57
I have a regex like this
r"^(.*?),(.*?)(,.*?=.*)"
And a string like this
name1,value1,tag11=value11,tag12=value12,tag13=value13
I am trying to check, using a regex, whether the string follows the following format: name,value
, name and value pairs separated by commas.
I need then to extract the comma-separated data using a regex.
I am getting the data extracted as a first group as name1 and a second group as value2 and a third group matches completely from tag11 to value13 (due to greedy match).
But I want to match each name and value pairs. I am new to Python and not sure how can I achieve this.
Upvotes: 3
Views: 424
Reputation: 9072
Turns out Python doesn't support repeated named capture groups unlike .NET, which is a bit of a shame (means my solution is a little longer than I thought it'd need to be). Does this meet your requirements?
import re
def is_valid(s):
pattern = '^name\d+,value\d+(,tag\d+=value\d+)*$'
return re.match(pattern, s)
def get_name_value_pairs(s):
if not is_valid(s):
raise ValueError('Invalid input: {}'.format(s))
pattern = '((?P<name1>\w+),(?P<value1>\w+))|(?P<name2>\w+)=(?P<value2>\w+)'
for match in re.finditer(pattern, s):
name1 = match.group('name1')
name2 = match.group('name2')
value1 = match.group('value1')
value2 = match.group('value2')
if name1 and value1:
yield name1, value1
elif name2 and value2:
yield name2, value2
if __name__ == '__main__':
testString = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
assert not is_valid('')
assert not is_valid('foo')
assert is_valid(testString)
print(list(get_name_value_pairs(testString)))
Output
[('name1', 'value1'), ('tag11', 'value11'), ('tag12', 'value12'), ('tag13', 'value13')]
Edit 1
Added input validation logic. Assumptions made:
name<x>,value<x>
tag<x>=value<x>
Note that I'm not currently validating that x is the same value within a name/value pair, which I assume is a requirement. I'm not sure how to do this leaving this as an exercise for the reader.
Upvotes: 2
Reputation: 627507
First, validate the format acc. to your pattern, and then split with [,=]
regex (that matches ,
and =
) and convert to a dictionary like this:
import itertools, re
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
if re.match(r'[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$', s):
l = re.split("[=,]", s)
d = dict(itertools.izip_longest(*[iter(l)] * 2, fillvalue=""))
print(d)
else:
print("Not valid!")
See the Python demo
The pattern is
^[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$
Details:
^
- start of string (in the re.match
, this can be omitted since the pattern is already anchored)[^,=]+
- 1+ chars other than =
and ,
,
- a comma[^,=]+
- 1+ chars other than =
and ,
(?:,[^,=]+=[^,=]+)+
- 1 or more sequences of:
,
- comma[^,=]+
- 1+ chars other than =
and ,
=
- an equal sign[^,=]+
- 1+ chars other than =
and ,
$
- end of string.Upvotes: 1
Reputation: 47282
Why not just split by the commas:
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
print(s.split(','))
If you want to use regex it's just as simple using the pattern:
[^,]+
Example:
https://regex101.com/r/jS6fgW/1
Upvotes: 1