Reputation: 23
I have to delete all the text before any letter or number using python.
The string I have to deal can be:
- Presa di coscienza
-3D is better than 2D
Basi di ottica
And the result have to be:
Presa di coscienza
3D is Better than 2D
Basi di ottica
Searching on internet I built this regex:
^.*?([A-Z]|[0-9])
It work well but it delete the first letter too. How can I do this?
Upvotes: 1
Views: 73
Reputation: 163632
The pattern that you tried deletes the first letter as it first matches 0 or more times any character using a non greedy quantifier, and then captures either an uppercase char A-Z or a digit 0-9.
That capture is part of the match, and will be deleted as well.
Instead you can use a positive lookahead (?=[A-Z0-9])
asserting what is directly to the right is either an uppercase char A-Z or a digit using a single character class.
Instead of using the non greedy .*?
you can use a negated character class matching 0+ times any char except a newline or upper case A-Z or a digit and prevent unnecessary backtracking.
^[^A-Z0-9\r\n]*(?=[A-Z0-9])
Explanation
^
Start of string[^A-Z0-9\r\n]*
Negated character class, match 0+ times any char except what is listed(?=[A-Z0-9])
Positive lookahead, assert what is directly to the right is a char A-Z or digit 0-9Upvotes: 1
Reputation: 351
Positive lookahead is your answer:
^.*?(?=[A-Z]|[0-9])
The extra ?=
makes all the difference:
Positive lookahead will pretty much match any [A-Z]|[0-9]
group found after the main expression (e.g ^.*?
) without actually including it in the result.
Upvotes: 1