Reputation: 3226
I want to get values of command tags (GET, FROM, IN, etc.) My command is:
// My command
$_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ';
// My parser
if(preg_match_all('/(GET|FROM|IN)\s+([^\s]+)/si',$_cmd, $m))
$cmd = array_combine($m[1], $m[2]);
Output:
Array
(
[GET] => a,
[FROM] => p
[IN] => a
[from] => Sarajevo"
)
I am looking for this output:
Array
(
[GET] => a, b
[FROM] => p
[IN] => a and c="I am from Sarajevo" or d>1
)
As you see, problem is with whitespaces and repeated command tags in strings (like from
). So how can I parse this command?
Upvotes: 3
Views: 146
Reputation: 6513
$_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ';
$tpar = preg_split('/\s+(GET|FROM|IN)\s+/i', ' '.$_cmd.' ', -1, PREG_SPLIT_DELIM_CAPTURE);
array_walk($tpar, 'trim');
print_r($tpar);
// gives:
array(
[0] => GET
[1] => a, b
[2] => FROM
[3] => p
[4] => IN
[5] => a and c="I am from Sarajevo" or d>1
)
// the rest is straight forward
Upvotes: 1
Reputation: 145482
You cannot easily parse that with a single regex. (It's doable, but not simple.)
You should use a simple tokenizer, where a regex again becomes a useful tool:
preg_match_all('/\w+|".*?"|\W/', $_cmd = 'GET a, b FROM p IN a and c="I am from Sarajevo" or d>1 ', $list);
This gives you a simple list, where you just have to find the clauses that you are interested in, then remerge the subsequent tokens (though I'm confused about your use case):
[0] => Array
(
[0] => GET
[1] => a
[2] => ,
[3] => b
[4] => FROM
[5] => p
[6] => IN
[7] => a
[8] => and
[9] => c
[10] => =
[11] => "I am from Sarajevo"
[12] => or
[13] => d
[14] => >
[15] => 1
)
Upvotes: 8
Reputation: 11779
if( preg_match_all('/(GET|FROM|IN)(.(?!(GET|FROM|IN)))+\s*/si',$_cmd, $m))
this means - find any char after keyword which is not followed by GET, FROM or IN whith whitespace after it
Upvotes: 3
Reputation: 41070
You could remove the case insensitive i
after the delimiter /
. And also make sure there is at least one whitespace after the keywords.
Upvotes: 1
Reputation: 7494
You need to develop a scripting language for this. Regexps aren't suitable for these purposes.
Upvotes: 1