Sebastian
Sebastian

Reputation: 967

Do I need to include `r` before a regex search?

I'm looking to split terms on a delimiter. I'd like to put the number as index and the name as name.

My terms:

The Beehive
12. Bar 821
13. Natives Bar
14. Last Call Bar
15. Scarlet Lounge
16. Linden Room
17. Rooftop 25

I'm using this code:

terms = ['The Beehive', '12. Bar 821', '13. Natives Bar', '14. Last Call Bar', '15. Scarlet Lounge', '16. Linden Room', '17. Rooftop 25']

delim = re.match('\d+\. ', terms)

if delim is None:
    print(delim)
else:
     index = index[:delim.end()]
     name = index[delim.end():]

This fails to capture the split. I've tested it by printing the delim and it doesn't match anything.

Upvotes: 0

Views: 250

Answers (2)

Jens
Jens

Reputation: 9130

The match() function accepts only individual strings, so you have to iterate over terms separately:

>>> for term in terms:
...     match = re.match(r'^(?P<index>(\d+\. )?)(?P<name>.*)$', term)  # Return a match object which contains the named groups.
...     index, _, name = match.groups()  # Unpack the groups.
...     # index = match.group('index')
...     # name = match.group('name')
...     print(index, name)
... 
 The Beehive
12.  Bar 821
13.  Natives Bar
14.  Last Call Bar
15.  Scarlet Lounge
16.  Linden Room
17.  Rooftop 25

Also notice the use of groups in the regular expression, which returns a Group object with named matches.

Regarding whether to use the r'' prefix or not, take a look at this question or this excerpt from the docs:

The r prefix, making the literal a raw string literal, is needed […] because escape sequences in a normal “cooked” string literal that are not recognized by Python, as opposed to regular expressions, now result in a DeprecationWarning and will eventually become a SyntaxError. See The Backslash Plague.

Upvotes: 0

mad_
mad_

Reputation: 8273

You are using list as compared to string

import re
terms = ['The Beehive', '12. Bar 821', '13. Natives Bar', '14. Last Call Bar', '15. Scarlet Lounge', '16. Linden Room', '17. Rooftop 25']

delim = re.compile('\d+\.')
for term in terms:
    match = delim.search(term)
    if match:
        print(term[:match.end()]) #index
        print(term[match.end():]) #name

Upvotes: 2

Related Questions