Jeff
Jeff

Reputation: 309

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.

Example: "this is literal" -donotfindme site:examplesite.com This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.

Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.

Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:

(this is php)

$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';

$regex = preg_match($pattern, $search, $matches);

print_r($matches);

But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?

I don't necessarily need code even a real nice place with tutorials would probably do the job.

Thanks!

Upvotes: 1

Views: 1780

Answers (4)

e-motiv
e-motiv

Reputation: 5893

Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).

But maybe it should not be done with only regular expressions though.

  1. Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
  2. it might even be that it is faster with this approach.

It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.

class mySearchToSql extends mysqli {

    protected function filter($what) {
        if (isset(what) {
                    //echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug

            //Split into different desires
            preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
                    //echo '<pre>'.var_export($split,1).'</pre>';//debug                

            //Surround with SQL
            array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
            array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
            array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
                    //echo '<pre>'.var_export($split,1).'</pre>';//debug

            //Add AND or OR
            $this   ->where($split[3])                      
                    ->where(array_merge($split[1],$split[2]), true);
        }
    }

    protected function sur(&$v,$k,$sur) {
        if (!empty($v))
            $v=$sur[0].$this->real_escape_string($v).$sur[1];
    }

    function where($s,$OR=false) {
        if (empty($s)) return $this;
        if (is_array($s)) {
            $s=(array_filter($s));
            if (empty($s)) return $this;
            if($OR==true)  
                $this->W[]='('.implode(' OR ',$s).')';
            else 
                $this->W[]='('.implode(' AND ',$s).')';
        } else 
            $this->W[]=$s;
        return $this;
    }

    function showSQL() {
        echo $this->W?  'WHERE '.       implode(L.' AND ',$this->W).L:'';
}

Thanks for all stackoverflow answers to get here!

Upvotes: 1

Evan Fosmark
Evan Fosmark

Reputation: 101751

You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals

I ended up using the following for searching for them and it worked perfectly:

(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1

This regex differs from the others as it properly handles escaped quotation marks inside the string.

Upvotes: 0

rivy
rivy

Reputation: 1630

Sorry, but my php is a bit rusty, but this code will probably do what you request:

$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';

$regex = preg_match($pattern, $search, $matches);

print_r($matches[1]);

$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.

See preg_match in the PHP documentation for specifics about subexpressions.

I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.

Upvotes: 1

David Z
David Z

Reputation: 131640

Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:

$pattern = '/"(.*)"/';

and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:

$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';

will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".

For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.

You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.

Upvotes: 4

Related Questions