Victor Semenov
Victor Semenov

Reputation: 47

Python regex to split both on number and on capital letter

I cannot get exactly what I want with regex I have, for example a string

2000H2HfH

I need to get ['2000','H','2','Hf','H'].

So, I need to split by number and by capital letter or capital following string

I use this ([A-Z][a-z]?)(\d+)? and lose the staring number, which is understandable why, but I cannot get it back for the result to be readable?

Upvotes: 2

Views: 370

Answers (2)

Mullo
Mullo

Reputation: 106

You have two capture groups one after another, so you capture them one after other. To achieve your goal you should modify your capture like this

([A-Z][a-z]?|\d+)?

Here the | symbol means that you capture capital letter following by lowercase letters OR number.

There is very nice service to compose and test regular expressions https://regex101.com/

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

re.findall(r'\d+|[A-Z][a-z]*', text)

See a regex demo. Details:

  • \d+ - 1+ digits
  • | - or
  • [A-Z][a-z]* - an upper case letter and then zero or more lowercase ones.

See a Python demo:

import re
text = "2000H2HfH"
print( re.findall(r'\d+|[A-Z][a-z]*', text) )
# => ['2000', 'H', '2', 'Hf', 'H']

Upvotes: 4

Related Questions