Reputation:
If we have the sentence = "George coudn't play football in y. 1998 but plays football at θ. 226", which by letter I mean any letter from Greek or English vocabulary. Is there any way to have as an output = "George coudn't play football in but plays football in"
I tried this one, which removed only the numbers
re_numb = re.compile(r'\d+')
sent = re_numb.sub('', sent)
Upvotes: 0
Views: 99
Reputation: 978
The following regex can capture the sentence before .y 1995
and θ. 226
(\D)+(?=\s.\.\s\d+)
If you want to capture only up to first match add ^
to only match from the start of the string
^(\D)+(?=\s.\.\s\d+)
EDIT Code sample
To extract each match
import re
text = "George couldn't play football in y. 1998 but plays football at θ. 226"
for match in re.finditer(r'(\D)+(?=\s.\.\s\d+)', text):
print(match.group(), end='') # print without new line
Output
George couldn't play football in but plays football at
To extract only the first match
import re
text = "George couldn't play football in y. 1998 but plays football at θ. 226"
for match in re.finditer(r'^(\D)+(?=\s.\.\s\d+)', text):
print(match.group(), end='')
Output
George couldn't play football in
Upvotes: 0
Reputation: 43199
Just use a Unicode range as in
\s+[\u03b1-\u03c9]+\.\s+\d+
See a demo on regex101.com and a Unicode table for greek letters.
In Python
this could be
import re
pattern = re.compile(r'\s+[\u03b1-\u03c9]+\.\s+\d+')
sentence = "George coudn't play football in γ. 1998 but plays football at θ. 226"
sentence = pattern.sub('', sentence)
print(sentence)
And yields
George coudn't play football in but plays football at
Upvotes: 1