Reputation: 37458
i try to strip a typical Google Search String into it's part. i.e. the sting could be :"how to" engine -fuel
So i want to get "how to" and engine and -fuel seperately.
I tried with the following preg_match_all, but i get "how and to" seperately as well and that might get unneccesarily difficult to process.
preg_match_all(
'=(["]{1}[^"]{1,}["]{1})'
.'|([-]{1}[^ ]{1,}[ ]{1})'
.'|([^-"]{1}[^ ]{1,}[ ]{1})=si',
$filter,
$matches,
PREG_PATTERN_ORDER);
Anyone any idea how to do this right?
Upvotes: 2
Views: 96
Reputation: 170298
Try:
$q = '"how to" engine -fuel';
preg_match_all('/"[^"]*"|\S+/', $q, $matches);
print_r($matches);
which will print:
Array ( [0] => Array ( [0] => "how to" [1] => engine [2] => -fuel ) )
Meaning:
"[^"]*" # match a quoted string
| # OR
\S+ # 1 or more non-space chars
Upvotes: 2
Reputation: 11181
Try this
(?i)("[^"]+") +([a-z]+) +(\-[a-z]+)\b
code
if (preg_match('/("[^"]+") +([a-z]+) +(-[a-z]+)\b/i', $subject, $regs)) {
$howto = $regs[1];
$engine = $regs[2];
$fuel = $regs[3];
} else {
$result = "";
}
Explanation
"
(?i) # Match the remainder of the regex with the options: case insensitive (i)
( # Match the regular expression below and capture its match into backreference number 1
\" # Match the character “\"” literally
[^\"] # Match any character that is NOT a “\"”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
)
\ # Match the character “ ” literally
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ # Match the character “ ” literally
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 3
\- # Match the character “-” literally
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
"
Hope this helps.
Upvotes: 1