Antoine Brunel
Antoine Brunel

Reputation: 1105

Extract All occurrences (not just the first one) matching a regex in a Google sheets cell

I'm trying to extract all occurrences matching a regex from a Google Sheets cell.

Example of a cell content in D5:

Genauer zu reden, sie sind gegeben sozusagen als Einbettungen in konkreten psychischen Erlebnissen, die in der Funktion der Bedeutungsintention oder Bedeutungserfüllung (in letzterer Hinsicht als illustrierende oder evidentmachende Anschauung) zu gewissen sprachlichen Ausdrücken gehören und mit ihnen eine phänomenologische Einheit bilden

REGEXEXTRACT(D5; "\w+edeut\w+") only returns the first match, i.e. Bedeutungsintention.

I need all of them. In this case, Bedeutungsintention and Bedeutungserfüllung

Thanks !

Upvotes: 1

Views: 897

Answers (2)

pgSystemTester
pgSystemTester

Reputation: 9928

All of these examples can be seen in this shared file

Formula Solution

=Filter(split(textjoin(" ",true,D5:D)," "),find("edeut",split(textjoin(" ",true,D5:D)," "),1)>0)

This is probably the easiest as it uses standard sheet functionality. It splits (converts to an array) the text based on a space ' ' separator. It then filters on any words that contain the specified text. From there you can use other sheet functionality to manipulate the text (transpose, unique, substitute out other characters such as periods)

Using the textjoin function will allow for this to be applied to multiple cells.

Custom Function Solution

This is basically doing the same as the first option but using app scripts. It might provide more flexibility.

 /**
 * @OnlyCurrentDoc
 */

/**
 * Finds instances of value
 * 
 * @param {range} theCells the Cell(s) with a value.
 * @param {string} theTextToSearch Text to find.
 * @param {string} theSplitter Text to split values
 * @return the results split by the thesplitter.
 * @customfunction
 */

function returnSomeText(theCells, theTextToSearch, theSplitter) {
  var allValues = theCells;
  if (Array.isArray(allValues)){
    allValues = allValues.join(" ");
  }

  var theArray = allValues.split(" ");
  var allResults =[];
  for(var i= 0;i<theArray.length;i++){
    if(theArray[i].search(theTextToSearch)>0){
      allResults.push(theArray[i]);
    }
  }
  return allResults.join(theSplitter);
}

Upvotes: 2

Erik Tyler
Erik Tyler

Reputation: 9355

To process D2:D with an array formula, try the following formula in, say, E2 (or the second cell of any otherwise empty range of columns wide enough to accommodate the maximum number of extractions that any one row may produce):

=ArrayFormula(IF(D2:D="",,SPLIT(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(REGEXEXTRACT(SPLIT(D2:D," "),"[A-Za-zäÄöÖüÜß]+edeut[A-Za-zäÄöÖüÜß]+"))),,COLUMNS(SPLIT(D2:D," "))))," ",0,1)))

This takes advantage of a quirk with QUERY whereby any number of header columns can be selected, not just 0 or 1. If we TRANSPOSE all data, then run a QUERY for nothing but headers, requesting as many headers as max results per transposed row, and then TRANSPOSE back to the original orientation, we get all matching results in one string. Then we can SPLIT that composite header string into separate cells by row.

Upvotes: 1

Related Questions