EightFawn
EightFawn

Reputation: 23

Delete all the text before a letter or a number

I have to delete all the text before any letter or number using python.

The string I have to deal can be:

- Presa di coscienza

-3D is better than 2D

Basi di ottica

And the result have to be:

Presa di coscienza

3D is Better than 2D

Basi di ottica

Searching on internet I built this regex:

^.*?([A-Z]|[0-9])

It work well but it delete the first letter too. How can I do this?

Upvotes: 1

Views: 73

Answers (2)

The fourth bird
The fourth bird

Reputation: 163632

The pattern that you tried deletes the first letter as it first matches 0 or more times any character using a non greedy quantifier, and then captures either an uppercase char A-Z or a digit 0-9.

That capture is part of the match, and will be deleted as well.

Instead you can use a positive lookahead (?=[A-Z0-9]) asserting what is directly to the right is either an uppercase char A-Z or a digit using a single character class.

Instead of using the non greedy .*? you can use a negated character class matching 0+ times any char except a newline or upper case A-Z or a digit and prevent unnecessary backtracking.

^[^A-Z0-9\r\n]*(?=[A-Z0-9])

Explanation

  • ^ Start of string
  • [^A-Z0-9\r\n]* Negated character class, match 0+ times any char except what is listed
  • (?=[A-Z0-9]) Positive lookahead, assert what is directly to the right is a char A-Z or digit 0-9

Regex demo

Upvotes: 1

Oneiros
Oneiros

Reputation: 351

Positive lookahead is your answer:

^.*?(?=[A-Z]|[0-9])

The extra ?= makes all the difference:

Positive lookahead will pretty much match any [A-Z]|[0-9] group found after the main expression (e.g ^.*?) without actually including it in the result.

Upvotes: 1

Related Questions