Reputation: 195
I am doing a spell check tutorial in Python and it uses this regular expression:
import re
def split_line(line):
return re.findall('[A-Za-z]+(?:\`[A-Za-z)+)?',line)
I was wondering if you could help me change this function so it will ignore '
, i.e. if I input the string he's
i will get ['he's']
and not ['he','s']
.
Upvotes: 0
Views: 1361
Reputation: 627022
Your regex is supposed to match one or more letters and then an optional occurrence of a backtick and again one or more letters. You can put the backtick into a character class and add '
into the class.
Note that you do not need to escape '
if you use a double-quoted string literal:
re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", line)
See the regex demo. Details:
[A-Za-z]+
- one or more ASCII letters (use [^\W\d_]+
to match any one or more Unicode letters)(?:['`][A-Za-z]+)*
- zero or more occurrences of '
or backtick followed with one or more ASCII letters.See the Python demo:
import re
text = "And he's done it o`key!"
print(re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", text))
# => ['And', "he's", 'done', 'it', 'o`key']
Upvotes: 0
Reputation: 17629
First you'll need to fix the original expression by replacing )
with ]
as mentioned by Marcin. Then simply add '
to the list of allowed characters (escaped by a back-slash):
import re
def split_line(line):
return re.findall('[A-Za-z\']+(?:\`[A-Za-z]+)?',line)
split_line("He's my hero")
#["He's", 'my', 'hero']
Of course, this will not consider any edge cases where the apostrophe is at the beginning or at the end of a word.
Upvotes: 1