Andresch Serj
Andresch Serj

Reputation: 37458

PHP/PCRE/regular expressions: stripping search term appart

i try to strip a typical Google Search String into it's part. i.e. the sting could be :"how to" engine -fuel

So i want to get "how to" and engine and -fuel seperately.

I tried with the following preg_match_all, but i get "how and to" seperately as well and that might get unneccesarily difficult to process.

preg_match_all(
     '=(["]{1}[^"]{1,}["]{1})'
    .'|([-]{1}[^ ]{1,}[ ]{1})'
    .'|([^-"]{1}[^ ]{1,}[ ]{1})=si', 
  $filter, 
  $matches,
  PREG_PATTERN_ORDER);

Anyone any idea how to do this right?

Upvotes: 2

Views: 96

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170298

Try:

$q = '"how to" engine -fuel';
preg_match_all('/"[^"]*"|\S+/', $q, $matches);
print_r($matches);

which will print:

Array
(
    [0] => Array
        (
            [0] => "how to"
            [1] => engine
            [2] => -fuel
        )

)

Meaning:

"[^"]*"    # match a quoted string
|          # OR
\S+        # 1 or more non-space chars

Upvotes: 2

Cylian
Cylian

Reputation: 11181

Try this

(?i)("[^"]+") +([a-z]+) +(\-[a-z]+)\b

code

if (preg_match('/("[^"]+") +([a-z]+) +(-[a-z]+)\b/i', $subject, $regs)) {
    $howto = $regs[1];
    $engine = $regs[2];
    $fuel = $regs[3];
} else {
    $result = "";
}

Explanation

"
(?i)        # Match the remainder of the regex with the options: case insensitive (i)
(           # Match the regular expression below and capture its match into backreference number 1
   \"           # Match the character “\"” literally
   [^\"]        # Match any character that is NOT a “\"”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \"           # Match the character “\"” literally
)
\           # Match the character “ ” literally
   +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(           # Match the regular expression below and capture its match into backreference number 2
   [a-z]       # Match a single character in the range between “a” and “z”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\           # Match the character “ ” literally
   +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(           # Match the regular expression below and capture its match into backreference number 3
   \-          # Match the character “-” literally
   [a-z]       # Match a single character in the range between “a” and “z”
      +           # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b          # Assert position at a word boundary
"

Hope this helps.

Upvotes: 1

Related Questions