Reputation: 4298
I am trying to use regex to determine how many and which groups are repeated.
Input String= $$$ 12345 aaa bbb ccc ddd eee 678 $$$ aaabbbbccc aaa-bbb-ddd aab aaaaaabbbbbbbbbbbbbc a000000009999999888888
Expected Output =
$$$
12345
aaa
bbb
ccc
ddd
eee
678
$$$
aaa
bbbb
ccc
aaa
bbb
ddd
aa
b
aaaaaa
bbbbbbbbbbbbb
c
a
00000000
9999999
888888
Please note that I have separated aaa
from aaaaaa
bbbbbbbbbbbbb
and c
for visual clarity. The actual output won't have any space or newline character between the words.
Rules:
1) There could be n
number of words with characters among a-zA-Z0-9$
. In above example, $$$
and 12345
are words.
2) A word could have n
groups with repeated characters. E.g. aaa
and a
3) What is the difference between a word and a group inside word? E.g. What is the difference between 12345
and aab
.
Answer: 12345
doesn't have any repeated element. So, this stays as is without any further breakdown. However, aab
has one repeated character a
because of which it will be broken down into aa
and b
.
4) The output (consisting of groups) must not have any spaces or newline characters before or after the group.
I was able to separate words from each other. This was easy. I used r[$0-9a-zA-Z]+
However, I am unsure how to separate groups inside the word. i.e. how do I separate a000000009999999888888
into a
00000000
9999999
888888
?
I'd appreciate any help. Thanks in advance.
Here's my regex101 sheet: REGEX101
Upvotes: 0
Views: 110
Reputation: 163477
If negative lookahead is supported, you might use an alternation and 2 capturing groups.
([a-z0-9$])\1+|(?:([a-z0-9$])(?!\2))+
([a-z0-9$])\1+
Match consecutive characters by capturing a what is in the character class in group 1 followed by repeating group 1 one or more times|
Or(?:
Non capturing group
([a-z0-9$])
Match what is in the character class and capture in group 2(?!\2)
Negative lookahead to assert that what follows is not group 2)+
Close non capturing group and repeat one or more timesYou did not specify any tool or language, but just an example how to get the full matches in Php or in Python.
Upvotes: 1