DeeKay789
DeeKay789

Reputation: 367

Regexmatch to find all string cells that match multiple words

I'm using ArrayFormula and FILTER combination to list all cells in a column that contain all of the search term words. I'm using REGEXMATCH rather than QUERY/CONTAINS/LIKE because my FILTER has other criteria that return TRUE/FALSE.

My problem seems to be precedence. So the following regex works in a limited way.

=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(.*?\bbob\b)(.*?\bcat\b)"&".*$")))

It will find Bob and cat but only if Bob precedes cat.

Google sheets fails if I try to use lookahead ?= ie

=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(?=.*?\bbob\b)(?=.*?\bcat\b)"&".*$")))

I don't want to use the '|' alternation in the string (repeat and reverse) as the input words may be many more than two so alternation becomes exponentially more complex.

Here's the test search array (each row is a single cell containing a string)...

Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive

... and the desired results I'm after.

Bob ate the dead cat
The cat ate live bob
Bob and the cat are alive

Once I have the regex sorted out, the final solution will be a user input text box where they simply enter the words that must be found in a string ie 'Bob cat'. This input string I think I can unpack into its separate words and concatenate to the above expression, however, if there's a 'best practice' way of doing this I'd like to hear.

Upvotes: 3

Views: 19725

Answers (3)

michaeldon
michaeldon

Reputation: 507

Maybe a bit easier to understand (I hate MMULT)

=query({A1:A},"select Col1 where "&join(" and ",arrayformula("Col1 matches '."&filter(B:B,B:B<>"")&".'")))

Where A contains your list of phrases and B contains your criteria words.

This part of the formula, =join(" and ",arrayformula("Col1 matches '."&filter(D3:D,D3:D<>"")&".'")) builds a query string from terms in B. for example:

Col1 matches '.cats.' and Col1 matches '.dogs.'

And then this list gets concatenated into the whole "select" expression:

select Col1 where Col1 matches '.cats.' and Col1 matches '.dogs.'

Upvotes: 0

Ed Nelson
Ed Nelson

Reputation: 10259

See if this does what you want. In B1 enter:

=arrayformula(filter(A1:A5,regexmatch(A1:A5,lower(index(split(C2," "),0,1)))*regexmatch(lower(A1:A5),lower(index(split(C2," "),0,2)))))

In C2 enter your search words with a space between them (cat Bob).

All words are changed to lower case. The index split separates the words in C2 and the separate words go in the regexmatch. Below is my shared test spreadsheet:

https://docs.google.com/spreadsheets/d/1sDNnSeqHbi0vLosxhyr8t8KXa3MzWC_WJ26eSVNnG80/edit?usp=sharing

Expanding on Max's very good answer, this will change the formula for the list of words in column C. I added an example to the shared spreadsheet (Sheet2).

=FILTER(A:A,MMULT(--REGEXMATCH(A:A,"(?i)"&TRANSPOSE(INDIRECT( "C1:C" & counta(C1:C ) ))),ROW(INDIRECT("a1:a"&COUNTA(INDIRECT( "C1:C" & counta(C1:C ) ))))^0)=COUNTA(INDIRECT( "C1:C" & counta(C1:C ) )))

Upvotes: 1

Max Makhrov
Max Makhrov

Reputation: 18707

Find 2 strings

Try:

=FILTER(A:A,REGEXMATCH(A:A,"(?i)bob.*cat|cat.*bob"))

You don't need to use ArrayFormula because filter is array formula itself.

  • (?i) - to make search case insensitive
  • bob.*cat|cat.*bob - match "bob→cat" or "cat→bob"

Find multiple strings

There's more complex formula for more words to match then 2.

Suppose we have a list in column A:

Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive
Cat is Bob
ate Cat bob

And need to find all matches of 3 words, put them in column C:

cat
ate
bob

The formula is:

=FILTER(A:A,MMULT(--REGEXMATCH(A:A, "(?i)"&TRANSPOSE(C1:C3)),ROW(INDIRECT("a1:a"&COUNTA(C1:C3)))^0)=COUNTA(C1:C3))

It uses RegexMatch of transposed list of words C1:C3, and then mmult function sums matches and =COUNTA(C1:C3) compares the number of matches with the number of words in list.

The result is:

Bob ate the dead cat
The cat ate live bob
ate Cat bob

Upvotes: 5

Related Questions