friedman
friedman

Reputation: 675

REGEXEXTRACT with capturing group

I wonder if there is a way to refer to the group captured in same expression when using REGEXEXTRACT() in Google Sheets?

Let's say we have a sample string: aaa123bbb123ccc456ddd123eee123fff456ggg

and we'd like to extract the part where some 3 digits occure at least 3 times. Normally I would use regex like this: (\d{3})(?:[^\1]*\1){2,}

but how to refer to the first group in =REGEXEXTRACT(A1;"(\d{3})(?:[^\1]*\1){2,}")? This one returns error in Sheets.

Upvotes: 10

Views: 9649

Answers (4)

j0lly_Rog3r
j0lly_Rog3r

Reputation: 21

Found this by complete accident! It's exactly what we've been looking for but it works counter to documentation/expectation.

regexextract(re(re1|re2|re3)re)

Will only return the matching re1, re2, or re3 and not the re's outside of the parenthesis.

If you are only looking for 1 thing, you can use re(re1)re

Upvotes: 0

Max Makhrov
Max Makhrov

Reputation: 18717

Without a script you may find the N-th occurrence of a group:

=REGEXEXTRACT("aaa123bbb123ccc456ddd123eee123fff456ggg","(\d{3}\D*){3}")

The result is:

456ddd

Or this re for digits only:

(?:(\d{3})(?:\D*)){3}

Upvotes: 1

yosefrow
yosefrow

Reputation: 2268

The way I emulated capture group behavior in google sheets is buy using REGEXEXTRACT Inline with REGEXREPLACE

For example

=REGEXREPLACE(A1, "word$", "special" & REGEXEXTRACT(A1, "word$"))

Explained:

# REGEXREPLACE(my_cell, regex_to_match_word, text & capture_function)
=REGEXREPLACE(
    A1, 
    "word$", 
    "special" & 
        # REGEXEXTRACT("my_cell", "capture_regex")
        REGEXEXTRACT(
            A1, 
            "word$"
        )
)

References

REGEXREPLACE: https://support.google.com/docs/answer/3098245?hl=en

REGEXEXTRACT: https://support.google.com/docs/answer/3098244?hl=en

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626920

There is no backreference support in RE2 patterns, you need to write a custom JS function to get what you need:

function IS_THREE_DIGIT_REPEATING(input) {
  var rx = /(\d{3})(.*\1){2}/;
  var res = rx.exec(input);
  return res ? res[1] : "No";
}

It will print the contents of the first capturing group in the cell (the 3 digits that are repeating) or No if there is no match.

enter image description here

Pattern details

  • (\d{3}) - Capturing group 1: three digits
  • (.*\1){2} - 2 consecutive occurrences of any 0+ chars other than linebreak chars followed with the same value as captured in Group 1.

Upvotes: 9

Related Questions