Reputation: 570
I have a string:
property1=1234, property2=102.201.333, property3=abc
I want to capture 1234 and 102.201.333. I am trying to use the regex:
property1=([^,]*)|property2=([^,]*)
But it only manages to capture one of the values. Based on this link I also tried:
((?:property1=([^,]*)|property2=([^,])+)
(?:(property1=([^,]*)|property2=([^,])+)
They capture an extra group from somewhere I can't figure.
What am I missing?
P.S. I am using re.search().
Edit: There may be something wrong in my calling code:
m = re.search('property1=([^,]*)|property2=([^,]*)', text);
print m.groups()
Edit2: It doesn't have to be propertyX. It can be anything:
foo1=123, bar=101.2.3, foobar=abc
even
foo1=123, bar=weirdbar[345], foobar=abc
Upvotes: 0
Views: 156
Reputation: 490118
Regular expressions are great for things that act like lexemes, not so good for general purpose parsing.
In this case, though, it looks like your "configuration-y string" may consist solely of a sequence of lexemes of the form: word =
value [ ,
word =
value ... ]. If so, you can use a regexp and repetition. The right regexp depends on the exact form of word and value, though (and to a lesser extent, whether you want to check for errors). For instance, is:
this="a string with spaces", that = 42, quote mark = "
allowed, or not? If so, is this
set to a string with spaces
(no quotes) or "a string with spaces"
(includes quotes)? Is that
set to 42
(which has a leading blank) or just 42
(which does not)? Is quote mark
(which has embedded spaces) allowed, and is it set to one double-quote mark? Do double quotes, if present, "escape" commas, so that you can write:
greeting="Hello, world."
Assuming spaces are forbidden, and the word and value parts are simply "alphanumerics as matched by \w
":
for word, value in re.findall(r'([\w]+)=([\w]+)', string):
print word, value
It's clear from the 102.201.333
value that \w
is not sufficient for the value
match, though. If value is "everything not a comma" (which includes whitespace), then:
for word, value in re.findall(r'([\w]+)=([^,]+)', string):
print word, value
gets closer. These all ignore "junk" and disallow spaces around the =
sign. If string
is "$a=this, b = that, c=102.201.333,,"
, the second for
loop prints:
a this
c 102.201.333
The dollar-sign (not an alphanumeric character) is ignored, the value for b
is ignored due to white-space, and the two commas after the value for c
are also ignored.
Upvotes: 1
Reputation: 538
I have tried building a regular expression for you which will give you the values after property1= and property2 but I am not sure how you use them in Python.
Edit
now captures other stuff apart from property before the '=' sign.
This is my original regular expression which does capture the value.
(?<=[\w]=).*?[^,]+
and this is a variation of the above, IMO what I believe you would need to use in Python
/(?<=[\w]=).*?[^,]+/g
Upvotes: 0
Reputation: 86270
As an alternative, we could use some string splitting to create a dictionary.
text = "property1=1234, property2=102.201.333, property3=abc"
data = dict(p.split('=') for p in text.split(', '))
print data["property2"] # '102.201.333'
Upvotes: 1
Reputation: 9112
you could try:
property_regex = re.compile('property[0-9]+=(?P<property_value>[^\s]+)')
that would match any property after the equals sign and before a space. It would be accessible from the name property_value
just like the documentation says:
copied from python re documentation
For example, if the pattern is (?P[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g).
Upvotes: 0
Reputation: 282104
You're using a |
. That means your regex will match either the thing on the left of the bar, or the thing on the right.
Upvotes: 0