python regex keep only words that start with alphabet and continues with [a-zA-Z0-9]

Question

Given this text "hey a2a 3beauty hou\se heyYou2", I would like to keep only words that start with alphabeth and continue with a-z, or A-Z, or numbers. So this would be my desired output: " hey a2a heyYou2".

My solution so far passes through text.split() function:

text = "hey a2a 3beauty hou\se heyYou2"
text = text.split()
text = [w for w in text if re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", w) is not None]
' '.join(text)

Out[55]: 'hey a2a heyYou2'

Is there a fast, more efficient, way I can achieve this using regex, without splitting the text into a list of words?

Wiktor Stribiżew · Accepted Answer

You may use a single re.sub call with the following regex:

\s*(?



See the regex demo

Details


\s* - 0+ whitespaces
(? - a leading whitespace boundary

(?![a-zA-Z][a-zA-Z0-9]*(?!\S)) - a negative lookahead that fails the match if, immediately to the right of the current location, there are


[a-zA-Z] - a letter
[a-zA-Z0-9]* - 0 or more alphanumeric chars
(?!\S) - a trailing whitespace boundary

\S+ - one or more non-whitespace chars


Python code demo:

import re
text = "hey a2a 3beauty hou\se heyYou2"
print(re.sub(r"\s*(? hey a2a heyYou2

python regex keep only words that start with alphabet and continues with [a-zA-Z0-9]

Answers (1)

Related Questions