Reputation: 499
I have an input that looks like this:
<ID>0<VAL>a1b<ID>1<VAL>a2b<ID>2<VAL>a3b<ID>3<VAL>a4b
I'd need to capture key-value pairs (e.g. id - val
) or at least an array of groups as the following:
[0, a1b, 1, a2b, 2, a3b, 3, a4b]
Capturing just one pair (i.e. when the input contains only a single pair) works with this:
(?>(?:<ID>(\d+))(?:<VAL>(.+)))?
the result being: [0, a1b]
.
But it doesn't work for multiple pairs - it captures 0 as a group then as a 2nd group it takes the rest of the input, excluding the first <VAL>
tag, as in: [0, a1b<ID>1<VAL>a2b<ID>2<VAL>a3b<ID>3<VAL>a4b]
Can someone point me to a direction I should look into?
UPDATE: what if <ID>
and <VAL>
are some special chars, for example 0x8F
and 0x9F
?
Upvotes: 1
Views: 86
Reputation: 499
@bobble-bubble's solutions is the most efficient (according to regex101): 4 matches in 72 steps and 1ms, but it's very restrictive. To fix this, the \w
can be replaced with [a-z\d]
, then it becomes even faster: 4 matches in 72 steps and 0ms.
@WiktorStribiżew's solution is the next most efficient: 4 matches in 64 steps and 4ms.
@albina's solution is the least efficient: 7 matches in 153 steps and 10ms
Upvotes: 1
Reputation: 1985
This regex matches keys and then values.
(?<=<ID>)(\d+)(?=<VAL>)|(?<=<VAL>)[a-z\d]*(?=<ID>)
There are 2 groups:
(?<=<ID>)(\d+)(?=<VAL>)
matches a key \d+
between <ID>
and <VAL>
using positive lookbehind and lookahead
(?<=<ID>)
is a positive lookbehind(?=<VAL>)
is a positive lookahead(?<=<VAL>)[a-z\d]*(?=<ID>)
matches a value between <VAL>
and <ID>
using positive lookbehind and lookahead
[a-z\d]*
matches a value(?<=<VAL>)
is a positive lookbehind(?=<ID>)
is a positive lookaheadUpvotes: 2