regex, on match strip and capture?

Question

I have a working block of code, but something tells me it's not the most efficient.

start with a few strings
if it has DBA or ATTN followed by at least any 2 characters, capture DBA or ATTN to the end of line, don't look at the next string
strip out what was just captured

What I have below seems to do that just fine.

import re

alt_name = ""

name1 = "JUST A NAME"
name2 = "UNITED STATES STORE DBA USA INC"
name3 = "ANOTHER FIELD"

regex = re.compile(r"\b(DBA\b.{2,})|\b(ATTN\b.{2,})")
if re.search(regex, name1):
    match = re.search(regex, name1)
    alt_name = match.group(0)
    name1 = re.sub(regex, "", name1)
elif re.search(regex, name2):
    match = re.search(regex, name2)
    alt_name = match.group(0)
    name2 = re.sub(regex, "", name2)
elif re.search(regex, name3):
    match3 = re.search(regex, name3)
    alt_name = match.group(0)
    name3 = re.sub(regex, "", name3)

print(name1)
print(name2)
print(name3)
print(alt_name)

Is there a way to capture and strip with just 1 line instead of searching, matching and then subbing? I'm looking for efficiency and readability. Just making it short to be clever isn't what I'm going for. Maybe this is just the way to do it?

Wiktor Stribiżew · Accepted Answer

You may use a method as a replacement argument to re.sub where you may save the matched text into a variable, and if you want to remove the match found, just return and empty string.

However, the pattern you have must be re-written to be more efficient:

r"\s*\b(?:DBA|ATTN)\b.{2,}"

See the regex demo.

\s* - 0+ whitespace chars
\b - a word boundary
(?:DBA|ATTN) - either a DBA or ATTN substrings
\b - a word boundary
.{2,} - 2 or more chars other than LF symbols, as many as possible.

Here is an example:

import re

class RegexMatcher:
    val = ''
    rx = re.compile(r"\s*\b(?:DBA|ATTN)\b.{2,}")

    def runsub(self, m):
        self.val = m.group(0).lstrip()
        return ""

    def process(self, s):
        return self.rx.sub(self.runsub, s)

rm = RegexMatcher()
name = "UNITED STATES STORE DBA USA INC"
print(rm.process(name))
print(rm.val)

See the Python demo.

Maybe it makes more sense to make val a list variable, and then .append(m.group(0).lstrip()).

regex, on match strip and capture?

Answers (1)

Related Questions