Reputation: 2978
I have the following words:
words = ['001operating', '1002application', '3aaa0225', '-setup', '--setup']
I need to drop any non-alphabetic characters before the word. The expected result is this one:
processed = ['operating', 'application', 'aaa0225', 'setup', 'setup']
This is what I have so far:
import re
processed = []
for w in words:
w = re.sub(r"(?<!\S)", "", w)
processed.append(w)
Any suggestions?
Upvotes: 4
Views: 67
Reputation: 627082
You can use
import re
re.sub(r"^[\W\d_]+", "", w)
With PyPi regex
module, you can use
import regex
regex.sub(r"^\P{L}+", "", w)
Details
^
- start of string (here, same as \A
)[\W\d_]+
- matches any non-word, digit or underscore char\P{L}+
- one or more chars other than any Unicode letters.See a Python demo:
import re, regex
words =['001operating', '1002application', '3aaa0225', '-setup', '--setup']
print( [re.sub(r"^[\W\d_]+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']
print( [regex.sub(r"^\P{L}+", "", w) for w in words] )
# => ['operating', 'application', 'aaa0225', 'setup', 'setup']
Upvotes: 2