How to remove any number of non-alphabetical symbols from the beginning of a word?

Question

I have the following words:

words = ['001operating', '1002application', '3aaa0225', '-setup', '--setup']

I need to drop any non-alphabetic characters before the word. The expected result is this one:

processed = ['operating', 'application', 'aaa0225', 'setup', 'setup']

This is what I have so far:

import re
processed = []
for w in words:
  w = re.sub(r"(?


Any suggestions?

Wiktor Stribiżew · Accepted Answer

You can use

import re
re.sub(r"^[\W\d_]+", "", w)

With PyPi regex module, you can use

import regex
regex.sub(r"^\P{L}+", "", w)

Details

^ - start of string (here, same as \A)
[\W\d_]+ - matches any non-word, digit or underscore char
\P{L}+ - one or more chars other than any Unicode letters.

See a Python demo:

import re, regex
words =['001operating', '1002application', '3aaa0225', '-setup', '--setup']

print( [re.sub(r"^[\W\d_]+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']

print( [regex.sub(r"^\P{L}+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']

How to remove any number of non-alphabetical symbols from the beginning of a word?

Answers (1)

Related Questions