Johny19
Johny19

Reputation: 5582

Matching two almost similar string (python)

In a file I can have either of the following two string formats:

::WORD1::WORD2= ANYTHING
::WORD3::WORD4::WORD5= ANYTHING2

This is the regex I came up with:

::(\w+)(?:::(\w+))?::(\w+)=(.*)

regex.findall(..)

[(u'WORD1', u'', u'WORD2', u' ANYTHING'),
 (u'WORD3', u'WORD4', u'WORD5', u' ANYTHING2')]

My first question is, why do I get this empty u'' when matching the first string ?

My second question is, is there an easier way to write this regex? the two strings are very similar, except that sometimes i have this extra ::WORD5

My last question is: most of the time I have only word between the :: so that's why \w+ is enough, but sometime I can get stuff like 2-WORD2 or 3-2-WORD2 etc.. there is this - that appears. How can I add it into the \w+ ?

Upvotes: 0

Views: 197

Answers (4)

Alexis
Alexis

Reputation: 1862

Based on the answer of thg435 you can just split to the "=" and then do exactly the same somethign like

left,right = a.split('=', 1)
answer = left.split('::')[1:] + [right]

Upvotes: 0

georg
georg

Reputation: 214959

Captured groups are always included in re.findall results, even if they don't match anything. That's why you get an empty string. If you just want to get what's between the delimiters, try split instead of findall:

a = '::WORD1::WORD2= ANYTHING'
b = '::WORD3::WORD4::WORD5= ANYTHING2'

print re.split(r'::|= ', a)[1:] # ['WORD1', 'WORD2', 'ANYTHING']
print re.split(r'::|= ', b)[1:] # ['WORD3', 'WORD4', 'WORD5', 'ANYTHING2']

In response to the comments, if "ANYTHING" could be well, anything, it's easier to use string functions rather than regexps:

x, y = a.split('= ', 1)
results = x.split('::')[1:] + [y]

Upvotes: 1

Ria
Ria

Reputation: 10347

for last question:

[\w\-]+

explain:

\w Matches any word character.

Upvotes: 1

Alexis
Alexis

Reputation: 1862

For you last question you can do something like (that accept letters, numbers and "-")

[a-zA-Z0-9\-]+

Upvotes: 0

Related Questions