Reputation: 5582
In a file I can have either of the following two string formats:
::WORD1::WORD2= ANYTHING
::WORD3::WORD4::WORD5= ANYTHING2
This is the regex I came up with:
::(\w+)(?:::(\w+))?::(\w+)=(.*)
regex.findall(..)
[(u'WORD1', u'', u'WORD2', u' ANYTHING'),
(u'WORD3', u'WORD4', u'WORD5', u' ANYTHING2')]
My first question is, why do I get this empty u''
when matching the first string ?
My second question is, is there an easier way to write this regex? the two strings are very similar, except that sometimes i have this extra ::WORD5
My last question is: most of the time I have only word between the ::
so that's why \w+
is enough, but sometime I can get stuff like 2-WORD2
or 3-2-WORD2
etc.. there is this -
that appears. How can I add it into the \w+
?
Upvotes: 0
Views: 197
Reputation: 1862
Based on the answer of thg435 you can just split to the "=" and then do exactly the same somethign like
left,right = a.split('=', 1)
answer = left.split('::')[1:] + [right]
Upvotes: 0
Reputation: 214959
Captured groups are always included in re.findall
results, even if they don't match anything. That's why you get an empty string. If you just want to get what's between the delimiters, try split
instead of findall
:
a = '::WORD1::WORD2= ANYTHING'
b = '::WORD3::WORD4::WORD5= ANYTHING2'
print re.split(r'::|= ', a)[1:] # ['WORD1', 'WORD2', 'ANYTHING']
print re.split(r'::|= ', b)[1:] # ['WORD3', 'WORD4', 'WORD5', 'ANYTHING2']
In response to the comments, if "ANYTHING" could be well, anything, it's easier to use string functions rather than regexps:
x, y = a.split('= ', 1)
results = x.split('::')[1:] + [y]
Upvotes: 1
Reputation: 1862
For you last question you can do something like (that accept letters, numbers and "-")
[a-zA-Z0-9\-]+
Upvotes: 0