Reputation: 47
I cannot get exactly what I want with regex I have, for example a string
2000H2HfH
I need to get ['2000','H','2','Hf','H']
.
So, I need to split by number and by capital letter or capital following string
I use this ([A-Z][a-z]?)(\d+)?
and lose the staring number, which is understandable why, but I cannot get it back for the result to be readable?
Upvotes: 2
Views: 370
Reputation: 106
You have two capture groups one after another, so you capture them one after other. To achieve your goal you should modify your capture like this
([A-Z][a-z]?|\d+)?
Here the | symbol means that you capture capital letter following by lowercase letters OR number.
There is very nice service to compose and test regular expressions https://regex101.com/
Upvotes: 0
Reputation: 626845
You may use
re.findall(r'\d+|[A-Z][a-z]*', text)
See a regex demo. Details:
\d+
- 1+ digits|
- or[A-Z][a-z]*
- an upper case letter and then zero or more lowercase ones.See a Python demo:
import re
text = "2000H2HfH"
print( re.findall(r'\d+|[A-Z][a-z]*', text) )
# => ['2000', 'H', '2', 'Hf', 'H']
Upvotes: 4