Alex
Alex

Reputation: 3267

How to match exception with double character with Python regular expression?

Got this string and regex findall:

txt = """
  dx d_2,222.22     ,,
  dy h..{3,333.33}  ,,
dz b#(1,111.11) ,,   dx-ay relative 4,444.44 ,, 
"""
for n in re.findall( r'([-\w]+){1}\W+([^,{2}]+)\s+,,\W+', txt ) :
    axis, value = n
    print "a:", axis 
    print "v:", value

In second (value) group I am trying to match anything except double commas, but it seems to catch only one ",". I can got it in this example with simple (.*?) but for certain reasons it got to be everything except ",,". Thank you.

EDIT: To see what I want to accomplish just use r'([-\w]+){1}\W+(.*?)\s+,,\W+' instead. It will give you such output:

a: dx
v: d_2,222.22
a: dy
v: h..{3,333.33}
a: dz
v: b#(1,111.11)
a: dx-ay
v: relative 4,444.44

EDIT #2: Please, answer which did not include double comma exception is not what is needed. Is there a solution...should be. So patern is :

Any whitespace - word with possibly "-" - than " " - and everything to ",," except itself.

Upvotes: 2

Views: 987

Answers (2)

Alex
Alex

Reputation: 3267

r'(?<=,,)\s+([-\w]+)\s(.*?)(?:,,)' is expression what is needed here. Much more simpler than I could thought.

r'(?<=,,) is positive lookbehind assertion and it will find a match in string which is after double commas , since the lookbehind will back up 2 chars and check if the contained pattern matches.

(?:,,) as last one is non-capturing version of regular parentheses, so everything in between should match.

\s or \s+ is there only for the matter of this specific type of string.

Upvotes: 1

Braj
Braj

Reputation: 46861

[^,{2}] is a character class that matches any character except: ',', '{', '2', '}'

With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters.

It should be ([^,]{2})+

(                        group and capture to \1
  [^,]{2}                  any character except: ',' (2 times)
)+                       end of \1 

Get the matched group from index 1 and 2

 ([-\w]+)\s+(.*?)\s+,,

Here is online demo

enter image description here

sample code:

import re
p = re.compile(ur'([-\w]+)\s+(.*?)\s+,,')
test_str = u"..."

re.findall(p, test_str)

Note: use \s* instead of \s+ if spaces are optional.

Upvotes: 3

Related Questions