jackypan1989
jackypan1989

Reputation: 2866

Regular expression for multiple matching in a hierarchical string

I would like to extract a hierarchical string into a multiple matching result.

For example:

   F           21           W          2121            /02 

[A-Z]{1}   [0-9]{1,n}   [A-Z]{1}    [0-9]{1,n}    /{1}[0-9]{1,n}

Result:

F21W2121/02 -> F, F21, F21W, F21W2121, F21W2121/02 

G06Q30/00 -> G, G06, G06Q, G06Q30, G06Q30/00

Is any good idea to parse this? I have been stuck.

Upvotes: 0

Views: 823

Answers (2)

Burhan Khalid
Burhan Khalid

Reputation: 174624

Unless I am missing something:

>>> import re
>>> ptrn = '((((([A-Z])[0-9]+)[A-Z])[0-9]+)/[0-9]+)'
>>> re.match(ptrn, 'G06Q30/00').groups()
('G06Q30/00', 'G06Q30', 'G06Q', 'G06', 'G')
>>> re.match(ptrn, 'F21W2121/02').groups()
('F21W2121/02', 'F21W2121', 'F21W', 'F21', 'F')

You can simply reverse the tuple to get the matches in the order of length.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

I think you want something like this,

^((((([A-Z])\d+)[A-Z])\d+)/\d+)$

DEMO

>>> import re
>>> s = "F21W2121/02"
>>> re.findall(r'^((((([A-Z])\d+)[A-Z])\d+)/\d+)$', s)
[('F21W2121/02', 'F21W2121', 'F21W', 'F21', 'F')]
>>> re.findall(r'^((((([A-Z])\d+)[A-Z])\d+)/\d+)$', "G06Q30/00")
[('G06Q30/00', 'G06Q30', 'G06Q', 'G06', 'G')]

Upvotes: 1

Related Questions