jon jon
jon jon

Reputation: 53

How to match the first word in a string?

I want to match the word 'St' or 'St.' or 'st' or 'st.' BUT only as the first word of a string. For example 'St. Mary Church Church St.' - should find ONLY first 'St.'.

I want to eventually replace the first occurrence with 'Saint'.

Upvotes: 2

Views: 4398

Answers (6)

Dartmouth
Dartmouth

Reputation: 1089

You don't need to use a regex for this, just use the split() method on your string to split it by whitespace. This will return a list of every word in your string:

matches = ["St", "St.", "st", "st."]
name = "St. Mary Church Church St."

words = name.split()  # split the string into words into a list
if words [0] in matches:
    words[0] = "Saint"  # replace the first word in the list (St.) with Saint
new_name = " ".join(words)  # create the new name from the words, separated by spaces
print(new_name)  # Output: "Saint Mary Church Church St."

Upvotes: 2

JazZ
JazZ

Reputation: 4579

Regex sub allows you to define the number of occurrences to replace in a string:

import re

s = "St. Mary Church Church St."
new_s = re.sub(r'^(St.|st.|St|st)\s', r'Saint ', s, 1) # the last argument defines the number of occurrences to be replaced. In this case, it will replace the first occurrence only.
print(new_s)
#  'Saint Mary Church Church St.'

Upvotes: 3

Tomerikoo
Tomerikoo

Reputation: 19395

Python 3.10 introduced a new Structural Pattern Matching feature (otherwise known as match/case) which can fit this use-case:

s = "St. Mary Church Church St."

words = s.split()
match words:
    case ["St" | "St." | "st" | "st.", *rest]:
        print("Found st at the start")
        words[0] = "Saint"
    case _:
        print("didn't find st at the start")

print(' '.join(words))

Will give:

Found st at the start
Saint Mary Church Church St.

While using s = "Mary Church Church St." will give:

didn't find st at the start
Mary Church Church St.

Upvotes: 0

Robert Hadsell
Robert Hadsell

Reputation: 726

You can simply pass the flag parameter into the sub function. This will allow you to reduce the amount of information you need to pass to the pattern parameter in the tool. This makes the code a little cleaner and reduces the chances of you missing a pattern:

import re

s = "St. Mary Church Church St."
new_s = re.sub(r'^(st.|st)\s', r'Saint ', s, 1, flags=re.IGNORECASE) # You can shorten the code from above slightly by ignoring the case
print(new_s)
#  'Saint Mary Church Church St.'

Upvotes: 1

orz
orz

Reputation: 11

Try using the regex '^\S+' to match the first non-space character in your string.

import re 

s = 'st Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st'

s = 'st. Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st.'

Upvotes: 0

Jeril
Jeril

Reputation: 8521

import re

string = "Some text"

replace = {'St': 'Saint', 'St.': 'Saint', 'st': 'Saint', 'st.': 'Saint'}
replace = dict((re.escape(k), v) for k, v in replace.iteritems())
pattern = re.compile("|".join(replace.keys()))
for text in string.split():
    text = pattern.sub(lambda m: replace[re.escape(m.group(0))], text)

This should work I guess, please check. Source

Upvotes: -2

Related Questions