Reputation: 53
I want to match the word 'St'
or 'St.'
or 'st'
or 'st.'
BUT only as the first word of a string.
For example 'St. Mary Church Church St.'
- should find ONLY first 'St.'
.
'st. Mary Church Church St.'
- should find ONLY 'st.'
'st Mary Church Church St.'
- should find ONLY 'st'
I want to eventually replace the first occurrence with 'Saint'.
Upvotes: 2
Views: 4398
Reputation: 1089
You don't need to use a regex for this, just use the split()
method on your string to split it by whitespace. This will return a list of every word in your string:
matches = ["St", "St.", "st", "st."]
name = "St. Mary Church Church St."
words = name.split() # split the string into words into a list
if words [0] in matches:
words[0] = "Saint" # replace the first word in the list (St.) with Saint
new_name = " ".join(words) # create the new name from the words, separated by spaces
print(new_name) # Output: "Saint Mary Church Church St."
Upvotes: 2
Reputation: 4579
Regex sub
allows you to define the number of occurrences to replace in a string:
import re
s = "St. Mary Church Church St."
new_s = re.sub(r'^(St.|st.|St|st)\s', r'Saint ', s, 1) # the last argument defines the number of occurrences to be replaced. In this case, it will replace the first occurrence only.
print(new_s)
# 'Saint Mary Church Church St.'
Upvotes: 3
Reputation: 19395
Python 3.10 introduced a new Structural Pattern Matching feature (otherwise known as match/case
) which can fit this use-case:
s = "St. Mary Church Church St."
words = s.split()
match words:
case ["St" | "St." | "st" | "st.", *rest]:
print("Found st at the start")
words[0] = "Saint"
case _:
print("didn't find st at the start")
print(' '.join(words))
Will give:
Found st at the start
Saint Mary Church Church St.
While using s = "Mary Church Church St."
will give:
didn't find st at the start
Mary Church Church St.
Upvotes: 0
Reputation: 726
You can simply pass the flag
parameter into the sub
function. This will allow you to reduce the amount of information you need to pass to the pattern
parameter in the tool. This makes the code a little cleaner and reduces the chances of you missing a pattern:
import re
s = "St. Mary Church Church St."
new_s = re.sub(r'^(st.|st)\s', r'Saint ', s, 1, flags=re.IGNORECASE) # You can shorten the code from above slightly by ignoring the case
print(new_s)
# 'Saint Mary Church Church St.'
Upvotes: 1
Reputation: 11
Try using the regex '^\S+'
to match the first non-space character in your string.
import re
s = 'st Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group() # 'st'
s = 'st. Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group() # 'st.'
Upvotes: 0
Reputation: 8521
import re
string = "Some text"
replace = {'St': 'Saint', 'St.': 'Saint', 'st': 'Saint', 'st.': 'Saint'}
replace = dict((re.escape(k), v) for k, v in replace.iteritems())
pattern = re.compile("|".join(replace.keys()))
for text in string.split():
text = pattern.sub(lambda m: replace[re.escape(m.group(0))], text)
This should work I guess, please check. Source
Upvotes: -2