darksideofthesun
darksideofthesun

Reputation: 621

Python regex: using or statement

I may not being saying this right (I'm a total regex newbie). Here's the code I currently have:

bugs.append(re.compile("^(\d+)").match(line).group(1))

I'd like to add to the regex so it looks at either '\d+' (starts with digits) or that it starts with 2 capital letters and contains a '-' before the first whitespace. I have the regex for the capital letters:

^[A-Z]{2,}

but I'm not sure how to add the '-' and the make an OR with the \d+. Does this make sense? Thanks!

Upvotes: 2

Views: 531

Answers (3)

Michael Laszlo
Michael Laszlo

Reputation: 12239

Write | for "or". For a sequence of zero or more non-whitespace characters, write \S*.

re.compile('^(\d+|[A-Z][A-Z]\S*-\s)')

Upvotes: 1

Kevin
Kevin

Reputation: 30161

re.compile(r"""
^  # beginning of the line
(?:  # non-capturing group; do not return this group in .group()
 (\d+)  # one or more digits, captured as a group
|  # Or
 [A-Z]{2}  # Exactly two uppercase letters
 \S*  # Any number of non-whitespace characters
 -  # the dash you wanted
)  # end of the non-capturing group
""",
re.X)  # enable comments in the regex

Upvotes: 0

abarnert
abarnert

Reputation: 365815

The way to do an OR in regexps is with the "alternation" or "pipe" operator, |.

For example, to match either one or more digits, or two or more capital letter:

^(\d+|[A-Z]{2,})

Regular expression visualization

Debuggex Demo

You may or may not sometimes need to add/remove/move parentheses to get the precedence right. The way I've written it, you've got one group that captures either the digit string or the capitals. While you're learning the rules (in fact, even after you've learned the rules) it's helpful to look at a regular expression visualizer/debugger like the one I used.


Your rule is slightly more complicated: you want 2 or more capital letters, and a hyphen before the first space. That's a bit hard to write as is, but if you change it to two or more capital letters, zero or more non-space characters, and a hyphen, that's easy:

^(\d+|[A-Z]{2,}\S*?-)

Regular expression visualization

Debuggex Demo

(Notice the \S*?—that means we're going to match as few characters as possible, instead of as many as possible, so we'll only match up to the first hyphen in THIS-IS-A-TEST instead of up to the last. If you want the other one, just drop the ?.)

Upvotes: 2

Related Questions