Fluxy
Fluxy

Reputation: 2978

How to remove any number of non-alphabetical symbols from the beginning of a word?

I have the following words:

words = ['001operating', '1002application', '3aaa0225', '-setup', '--setup']

I need to drop any non-alphabetic characters before the word. The expected result is this one:

processed = ['operating', 'application', 'aaa0225', 'setup', 'setup']

This is what I have so far:

import re
processed = []
for w in words:
  w = re.sub(r"(?<!\S)", "", w)
  processed.append(w)

Any suggestions?

Upvotes: 4

Views: 67

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

import re
re.sub(r"^[\W\d_]+", "", w)

With PyPi regex module, you can use

import regex
regex.sub(r"^\P{L}+", "", w)

Details

  • ^ - start of string (here, same as \A)
  • [\W\d_]+ - matches any non-word, digit or underscore char
  • \P{L}+ - one or more chars other than any Unicode letters.

See a Python demo:

import re, regex
words =['001operating', '1002application', '3aaa0225', '-setup', '--setup']

print( [re.sub(r"^[\W\d_]+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']

print( [regex.sub(r"^\P{L}+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']

Upvotes: 2

Related Questions