Reputation: 367
I'm using ArrayFormula and FILTER combination to list all cells in a column that contain all of the search term words. I'm using REGEXMATCH rather than QUERY/CONTAINS/LIKE because my FILTER has other criteria that return TRUE/FALSE.
My problem seems to be precedence. So the following regex works in a limited way.
=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(.*?\bbob\b)(.*?\bcat\b)"&".*$")))
It will find Bob and cat but only if Bob precedes cat.
Google sheets fails if I try to use lookahead ?= ie
=ArrayFormula(filter(A1:A5,regexmatch(A1:A5,"(?i)^"&"(?=.*?\bbob\b)(?=.*?\bcat\b)"&".*$")))
I don't want to use the '|' alternation in the string (repeat and reverse) as the input words may be many more than two so alternation becomes exponentially more complex.
Here's the test search array (each row is a single cell containing a string)...
Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive
... and the desired results I'm after.
Bob ate the dead cat
The cat ate live bob
Bob and the cat are alive
Once I have the regex sorted out, the final solution will be a user input text box where they simply enter the words that must be found in a string ie 'Bob cat'. This input string I think I can unpack into its separate words and concatenate to the above expression, however, if there's a 'best practice' way of doing this I'd like to hear.
Upvotes: 3
Views: 19725
Reputation: 507
Maybe a bit easier to understand (I hate MMULT)
=query({A1:A},"select Col1 where "&join(" and ",arrayformula("Col1 matches '."&filter(B:B,B:B<>"")&".'")))
Where A contains your list of phrases and B contains your criteria words.
This part of the formula, =join(" and ",arrayformula("Col1 matches '."&filter(D3:D,D3:D<>"")&".'")) builds a query string from terms in B. for example:
Col1 matches '.cats.' and Col1 matches '.dogs.'
And then this list gets concatenated into the whole "select" expression:
select Col1 where Col1 matches '.cats.' and Col1 matches '.dogs.'
Upvotes: 0
Reputation: 10259
See if this does what you want. In B1 enter:
=arrayformula(filter(A1:A5,regexmatch(A1:A5,lower(index(split(C2," "),0,1)))*regexmatch(lower(A1:A5),lower(index(split(C2," "),0,2)))))
In C2 enter your search words with a space between them (cat Bob).
All words are changed to lower case. The index split separates the words in C2 and the separate words go in the regexmatch. Below is my shared test spreadsheet:
https://docs.google.com/spreadsheets/d/1sDNnSeqHbi0vLosxhyr8t8KXa3MzWC_WJ26eSVNnG80/edit?usp=sharing
Expanding on Max's very good answer, this will change the formula for the list of words in column C. I added an example to the shared spreadsheet (Sheet2).
=FILTER(A:A,MMULT(--REGEXMATCH(A:A,"(?i)"&TRANSPOSE(INDIRECT( "C1:C" & counta(C1:C ) ))),ROW(INDIRECT("a1:a"&COUNTA(INDIRECT( "C1:C" & counta(C1:C ) ))))^0)=COUNTA(INDIRECT( "C1:C" & counta(C1:C ) )))
Upvotes: 1
Reputation: 18707
Try:
=FILTER(A:A,REGEXMATCH(A:A,"(?i)bob.*cat|cat.*bob"))
You don't need to use ArrayFormula
because filter is array formula itself.
(?i)
- to make search case insensitivebob.*cat|cat.*bob
- match "bob→cat" or "cat→bob"There's more complex formula for more words to match then 2.
Suppose we have a list in column A:
Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive
Cat is Bob
ate Cat bob
And need to find all matches of 3 words, put them in column C:
cat
ate
bob
The formula is:
=FILTER(A:A,MMULT(--REGEXMATCH(A:A,
"(?i)"&TRANSPOSE(C1:C3)),ROW(INDIRECT("a1:a"&COUNTA(C1:C3)))^0)=COUNTA(C1:C3))
It uses RegexMatch
of transposed list of words C1:C3
, and then mmult
function sums matches and =COUNTA(C1:C3)
compares the number of matches with the number of words in list.
The result is:
Bob ate the dead cat
The cat ate live bob
ate Cat bob
Upvotes: 5