Mohan Raj
Mohan Raj

Reputation: 57

Regex match in Python

I have a regex like this

r"^(.*?),(.*?)(,.*?=.*)"

And a string like this

name1,value1,tag11=value11,tag12=value12,tag13=value13

I am trying to check, using a regex, whether the string follows the following format: name,value, name and value pairs separated by commas.

I need then to extract the comma-separated data using a regex.

I am getting the data extracted as a first group as name1 and a second group as value2 and a third group matches completely from tag11 to value13 (due to greedy match).

But I want to match each name and value pairs. I am new to Python and not sure how can I achieve this.

Upvotes: 3

Views: 424

Answers (3)

Tagc
Tagc

Reputation: 9072

Turns out Python doesn't support repeated named capture groups unlike .NET, which is a bit of a shame (means my solution is a little longer than I thought it'd need to be). Does this meet your requirements?

import re

def is_valid(s):
    pattern = '^name\d+,value\d+(,tag\d+=value\d+)*$'
    return re.match(pattern, s)

def get_name_value_pairs(s):
    if not is_valid(s):
        raise ValueError('Invalid input: {}'.format(s))

    pattern = '((?P<name1>\w+),(?P<value1>\w+))|(?P<name2>\w+)=(?P<value2>\w+)'
    for match in re.finditer(pattern, s):
        name1 = match.group('name1')
        name2 = match.group('name2')
        value1 = match.group('value1')
        value2 = match.group('value2')

        if name1 and value1:
            yield name1, value1
        elif name2 and value2:
            yield name2, value2

if __name__ == '__main__':
    testString = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
    assert not is_valid('')
    assert not is_valid('foo')
    assert is_valid(testString)

    print(list(get_name_value_pairs(testString)))

Output

[('name1', 'value1'), ('tag11', 'value11'), ('tag12', 'value12'), ('tag13', 'value13')]

Edit 1

Added input validation logic. Assumptions made:

  • Must have initial name/value pair in form name<x>,value<x>
  • All following pairs must be in form tag<x>=value<x>
  • Names and values consist only of alphanumeric characters
  • Whitespace is not allowed

Note that I'm not currently validating that x is the same value within a name/value pair, which I assume is a requirement. I'm not sure how to do this leaving this as an exercise for the reader.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627507

First, validate the format acc. to your pattern, and then split with [,=] regex (that matches , and =) and convert to a dictionary like this:

import itertools, re
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
if re.match(r'[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$', s):
    l = re.split("[=,]", s)
    d = dict(itertools.izip_longest(*[iter(l)] * 2, fillvalue=""))
    print(d)
else:
    print("Not valid!")

See the Python demo

The pattern is

^[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$

Details:

  • ^ - start of string (in the re.match, this can be omitted since the pattern is already anchored)
  • [^,=]+ - 1+ chars other than = and ,
  • , - a comma
  • [^,=]+ - 1+ chars other than = and ,
  • (?:,[^,=]+=[^,=]+)+ - 1 or more sequences of:
    • , - comma
    • [^,=]+ - 1+ chars other than = and ,
    • = - an equal sign
    • [^,=]+ - 1+ chars other than = and ,
  • $ - end of string.

Upvotes: 1

l&#39;L&#39;l
l&#39;L&#39;l

Reputation: 47282

Why not just split by the commas:

s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
print(s.split(','))

If you want to use regex it's just as simple using the pattern:

[^,]+

Example:

https://regex101.com/r/jS6fgW/1

Upvotes: 1

Related Questions