Edward Smith
Edward Smith

Reputation: 13

Python Regex to add a "?" to the beginning of a word in a word list

#open text file
with open('words') as f:
    for line in f.readlines():
    #pull out all 3 letter words using regular expression and add to wordlist
    word_list += re.findall(r'\b(\w{3})\b', line)

I use this to find all 3 letter words in a dictionary. From there, I want to add a question mark to the beginning of each word. I assume I need the re.sub function, but can't seem to get the syntax right.

Upvotes: 1

Views: 473

Answers (3)

Anis R.
Anis R.

Reputation: 6912

You can use re.sub, where \1 refers to the first capture group:

re.sub(r'\b(\w{3})\b', r'?\1', line)

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142156

You can do this a few ways, one of them is to get all your 3 letters words and then update them afterwards, otherwise, you can do along the lines of what you're doing and extend a list as you go. There's not really a need for re.sub here if you want to end up building a list of 3 letters words prefixed with ?

Sample words file:

the quick brown fox called bob jumped over the lazy dog
and went straight to bed
cos bob needed to sleep right now

Sample code:

word_list = []
with open('words') as fin:
    for line in fin:
        matches = re.findall(r'\b(\w{3})\b', line)
        word_list.extend(f'?{word}' for word in matches)

Sample word_list after run:

['?the',
 '?fox',
 '?bob',
 '?the',
 '?dog',
 '?and',
 '?bed',
 '?cos',
 '?bob',
 '?now']

Upvotes: 1

Michał Turczyn
Michał Turczyn

Reputation: 37367

First compile pattern:

re.compile(r'\b(\w{3})\b')

and then use it like this:

word_list += '?' + re.search(line)

Upvotes: 0

Related Questions