watchtower
watchtower

Reputation: 4298

Extract count of groups and groups within a word using regex

I am trying to use regex to determine how many and which groups are repeated.

Input String= $$$ 12345 aaa bbb ccc ddd eee 678 $$$ aaabbbbccc aaa-bbb-ddd aab aaaaaabbbbbbbbbbbbbc a000000009999999888888

Expected Output = 
$$$ 
12345 
aaa  
bbb  
ccc  
ddd  
eee  
678 
$$$ 

aaa
bbbb
ccc 

aaa
bbb
ddd 

aa
b 

aaaaaa
bbbbbbbbbbbbb
c 

a
00000000
9999999
888888

Please note that I have separated aaa from aaaaaa bbbbbbbbbbbbb and cfor visual clarity. The actual output won't have any space or newline character between the words.

Rules:

1) There could be n number of words with characters among a-zA-Z0-9$. In above example, $$$ and 12345 are words.

2) A word could have n groups with repeated characters. E.g. aaa and a

3) What is the difference between a word and a group inside word? E.g. What is the difference between 12345 and aab.

Answer: 12345 doesn't have any repeated element. So, this stays as is without any further breakdown. However, aab has one repeated character a because of which it will be broken down into aa and b.

4) The output (consisting of groups) must not have any spaces or newline characters before or after the group.

I was able to separate words from each other. This was easy. I used r[$0-9a-zA-Z]+ However, I am unsure how to separate groups inside the word. i.e. how do I separate a000000009999999888888 into a 00000000 9999999 888888?

I'd appreciate any help. Thanks in advance.

Here's my regex101 sheet: REGEX101

Upvotes: 0

Views: 110

Answers (1)

The fourth bird
The fourth bird

Reputation: 163477

If negative lookahead is supported, you might use an alternation and 2 capturing groups.

([a-z0-9$])\1+|(?:([a-z0-9$])(?!\2))+

Regex demo

  • ([a-z0-9$])\1+ Match consecutive characters by capturing a what is in the character class in group 1 followed by repeating group 1 one or more times
  • | Or
  • (?: Non capturing group
    • ([a-z0-9$]) Match what is in the character class and capture in group 2
    • (?!\2) Negative lookahead to assert that what follows is not group 2
  • )+ Close non capturing group and repeat one or more times

You did not specify any tool or language, but just an example how to get the full matches in Php or in Python.

Upvotes: 1

Related Questions