Reputation: 5274
I have a working block of code, but something tells me it's not the most efficient.
What I have below seems to do that just fine.
import re
alt_name = ""
name1 = "JUST A NAME"
name2 = "UNITED STATES STORE DBA USA INC"
name3 = "ANOTHER FIELD"
regex = re.compile(r"\b(DBA\b.{2,})|\b(ATTN\b.{2,})")
if re.search(regex, name1):
match = re.search(regex, name1)
alt_name = match.group(0)
name1 = re.sub(regex, "", name1)
elif re.search(regex, name2):
match = re.search(regex, name2)
alt_name = match.group(0)
name2 = re.sub(regex, "", name2)
elif re.search(regex, name3):
match3 = re.search(regex, name3)
alt_name = match.group(0)
name3 = re.sub(regex, "", name3)
print(name1)
print(name2)
print(name3)
print(alt_name)
Is there a way to capture and strip with just 1 line instead of searching, matching and then subbing? I'm looking for efficiency and readability. Just making it short to be clever isn't what I'm going for. Maybe this is just the way to do it?
Upvotes: 1
Views: 259
Reputation: 626952
You may use a method as a replacement argument to re.sub
where you may save the matched text into a variable, and if you want to remove the match found, just return and empty string.
However, the pattern you have must be re-written to be more efficient:
r"\s*\b(?:DBA|ATTN)\b.{2,}"
See the regex demo.
\s*
- 0+ whitespace chars\b
- a word boundary(?:DBA|ATTN)
- either a DBA
or ATTN
substrings\b
- a word boundary.{2,}
- 2 or more chars other than LF symbols, as many as possible.Here is an example:
import re
class RegexMatcher:
val = ''
rx = re.compile(r"\s*\b(?:DBA|ATTN)\b.{2,}")
def runsub(self, m):
self.val = m.group(0).lstrip()
return ""
def process(self, s):
return self.rx.sub(self.runsub, s)
rm = RegexMatcher()
name = "UNITED STATES STORE DBA USA INC"
print(rm.process(name))
print(rm.val)
See the Python demo.
Maybe it makes more sense to make val
a list variable, and then .append(m.group(0).lstrip())
.
Upvotes: 1