SANDeveloper
SANDeveloper

Reputation: 570

Python Regex - Match multiple expression with groups

I have a string:

property1=1234, property2=102.201.333, property3=abc

I want to capture 1234 and 102.201.333. I am trying to use the regex:

property1=([^,]*)|property2=([^,]*)

But it only manages to capture one of the values. Based on this link I also tried:

((?:property1=([^,]*)|property2=([^,])+)
(?:(property1=([^,]*)|property2=([^,])+)

They capture an extra group from somewhere I can't figure.

What am I missing?

P.S. I am using re.search().

Edit: There may be something wrong in my calling code:

m = re.search('property1=([^,]*)|property2=([^,]*)', text);
print m.groups()

Edit2: It doesn't have to be propertyX. It can be anything:

foo1=123, bar=101.2.3, foobar=abc

even

foo1=123, bar=weirdbar[345], foobar=abc

Upvotes: 0

Views: 156

Answers (6)

torek
torek

Reputation: 490118

Regular expressions are great for things that act like lexemes, not so good for general purpose parsing.

In this case, though, it looks like your "configuration-y string" may consist solely of a sequence of lexemes of the form: word = value [ , word = value ... ]. If so, you can use a regexp and repetition. The right regexp depends on the exact form of word and value, though (and to a lesser extent, whether you want to check for errors). For instance, is:

this="a string with spaces", that = 42, quote mark = "

allowed, or not? If so, is this set to a string with spaces (no quotes) or "a string with spaces" (includes quotes)? Is that set to  42 (which has a leading blank) or just 42 (which does not)? Is quote mark (which has embedded spaces) allowed, and is it set to one double-quote mark? Do double quotes, if present, "escape" commas, so that you can write:

greeting="Hello, world."

Assuming spaces are forbidden, and the word and value parts are simply "alphanumerics as matched by \w":

for word, value in re.findall(r'([\w]+)=([\w]+)', string):
    print word, value

It's clear from the 102.201.333 value that \w is not sufficient for the value match, though. If value is "everything not a comma" (which includes whitespace), then:

for word, value in re.findall(r'([\w]+)=([^,]+)', string):
    print word, value

gets closer. These all ignore "junk" and disallow spaces around the = sign. If string is "$a=this, b = that, c=102.201.333,,", the second for loop prints:

a this
c 102.201.333

The dollar-sign (not an alphanumeric character) is ignored, the value for b is ignored due to white-space, and the two commas after the value for c are also ignored.

Upvotes: 1

Robert Dinaro
Robert Dinaro

Reputation: 538

I have tried building a regular expression for you which will give you the values after property1= and property2 but I am not sure how you use them in Python.

Edit

now captures other stuff apart from property before the '=' sign.

This is my original regular expression which does capture the value.

(?<=[\w]=).*?[^,]+

and this is a variation of the above, IMO what I believe you would need to use in Python

/(?<=[\w]=).*?[^,]+/g

Upvotes: 0

Brigand
Brigand

Reputation: 86270

As an alternative, we could use some string splitting to create a dictionary.

text = "property1=1234, property2=102.201.333, property3=abc"
data = dict(p.split('=') for p in text.split(', '))
print data["property2"] # '102.201.333'

Upvotes: 1

tenstar
tenstar

Reputation: 10516

try this:

property_regex = re.compile('property[0-9]+=([^\s]+)')

Upvotes: 0

PepperoniPizza
PepperoniPizza

Reputation: 9112

you could try:

property_regex = re.compile('property[0-9]+=(?P<property_value>[^\s]+)')

that would match any property after the equals sign and before a space. It would be accessible from the name property_value just like the documentation says:

copied from python re documentation

For example, if the pattern is (?P[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g).

Upvotes: 0

user2357112
user2357112

Reputation: 282104

You're using a |. That means your regex will match either the thing on the left of the bar, or the thing on the right.

Upvotes: 0

Related Questions