Reputation: 2866
I would like to extract a hierarchical string into a multiple matching result.
For example:
F 21 W 2121 /02
[A-Z]{1} [0-9]{1,n} [A-Z]{1} [0-9]{1,n} /{1}[0-9]{1,n}
Result:
F21W2121/02 -> F, F21, F21W, F21W2121, F21W2121/02
G06Q30/00 -> G, G06, G06Q, G06Q30, G06Q30/00
Is any good idea to parse this? I have been stuck.
Upvotes: 0
Views: 823
Reputation: 174624
Unless I am missing something:
>>> import re
>>> ptrn = '((((([A-Z])[0-9]+)[A-Z])[0-9]+)/[0-9]+)'
>>> re.match(ptrn, 'G06Q30/00').groups()
('G06Q30/00', 'G06Q30', 'G06Q', 'G06', 'G')
>>> re.match(ptrn, 'F21W2121/02').groups()
('F21W2121/02', 'F21W2121', 'F21W', 'F21', 'F')
You can simply reverse the tuple to get the matches in the order of length.
Upvotes: 1
Reputation: 174696
I think you want something like this,
^((((([A-Z])\d+)[A-Z])\d+)/\d+)$
>>> import re
>>> s = "F21W2121/02"
>>> re.findall(r'^((((([A-Z])\d+)[A-Z])\d+)/\d+)$', s)
[('F21W2121/02', 'F21W2121', 'F21W', 'F21', 'F')]
>>> re.findall(r'^((((([A-Z])\d+)[A-Z])\d+)/\d+)$', "G06Q30/00")
[('G06Q30/00', 'G06Q30', 'G06Q', 'G06', 'G')]
Upvotes: 1