Styn
Styn

Reputation: 311

Regex to find character only if it occurs 4 times

I'm stuck on making this Regex. I tried using look-ahead and look-behind together, but I couldn't use the capture group in the look-behind. I need to extract characters from a string ONLY if it occurs 4 times.

If I have these strings

The first one will match because it has 4 A's in a row. The second one will NOT match because it has 6 B's in a row. The third one will match because it still has 4 A's. What makes it even more frustrating, is that it can be any char from A to Z occuring 4 times.

Positioning does not matter.

EDIT: My attempt at the regex, doesn't work.

(([A-Z])\2\2\2)(?<!\2*)(?!\2*)

Upvotes: 3

Views: 1058

Answers (3)

CertainPerformance
CertainPerformance

Reputation: 370729

If lookbehind is allowed, after capturing the character, negative lookbehind for \1. (because if that matches, the start of the match is preceded by the same character as the captured first character). Then backreference the group 3 times, and negative lookahead for the \1:

`3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44`
.split('\n')
.forEach((str) => {
  console.log(str.match(/([a-z])(?<!\1.)\1{3}(?!\1)/i));
});

  • ([a-z]) - Capture a character
  • (?<!\1.) Negative lookbehind: check that the position at the 1st index of the captured group is not preceded by 2 of the same characters
  • \1{3} - Match the same character that was captured 3 more times
  • (?!\1) - After the 4th match, make sure it's not followed by the same character

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163287

Another variant could be capturing the first char in a group 1.

Assert that then the previous 2 chars on the left are not the same as group 1, match an additional 3 times group 1 which is a total of 4 the same chars.

Then assert what is on the right is not group 1.

([A-Z])(?<!\1\1)\1{3}(?!\1)
  • ([A-Z]) Capture group 1, match a single char A-Z
  • (?<!\1\1) Negative lookbehind, assert what is on the left is not 2 times group 1
  • \1{3} Match 3 times group 1
  • (?!\1) Assert what is on the right is not group 1

For example

let pattern = /([A-Z])(?<!\1\1)\1{3}(?!\1)/g;
[
  "3346AAAA44",
  "3973BBBBBB44",
  "9755BBBBBBAAAA44",
  "AAAA",
  "AAAAB",
  "BAAAAB"
].forEach(s =>
  console.log(s + " --> " + s.match(pattern))
);

Upvotes: 0

AndreyCh
AndreyCh

Reputation: 1403

Another version without lookbehind (see demo). The captured sequence of 4 equal characters will be rendered in Group 2.

(?:^|(?:(?=(\w)(?!\1))).)(([A-Z])\3{3})(?:(?!\3)|$)
  • (?:^|(?:(?=(\w)(?!\1))).) - ensure it's the beginning of the string. Otherwise, the 2nd char must be different from the 1st one - if yes, skip the 1st char.
  • (([A-Z])\3{3}) Capture 4 repeated [A-Z] chars
  • (?:(?!\3)|$) - ensure the first char after those 4 is different. Or it's the end of the string

As it was suggested by bobble-bubble in this comment - the expression above can be simplified to (demo):

(?:^|(\w)(?!\1))(([A-Z])\3{3})(?!\3)

Upvotes: 2

Related Questions