Reputation: 73
I'd like to parse a string like the following :
'serviceHits."test_server"."http_test.org" 31987'
into an array like :
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
Basically I want to split in dots and spaces, treating strings within quotes as a single value.
The format of this string is not fixed, this is just one example. It might contain different numbers of elements with quoted and numerical elements in different places.
Other strings might look like :
test.2 3 which should parse to [test|2|3]
test."342".cake.2 "cheese" which should parse to [test|342|cake|2|cheese]
test."red feet".3."green" 4 which should parse to [test|red feet|3|green|4]
And sometimes the oid string may contain a quote mark, which should be included if possible, but it's the least important part of the parser:
test."a \"b\" c" "cheese face" which should parse to [test|a "b" c|cheese face]
I'm trying to parse SNMP OID strings from agent written by people with quite varying ideas on what an OID should look like, in a generic manner.
Parsing off the oid string (the bit separated with dots) return value (the last value) into separate named arrays would be nice. Simply splitting on space before parsing the string wouldn't work, as both the OID and the value can contain spaces.
Thanks!
Upvotes: 1
Views: 1571
Reputation: 3722
Here is a solution that works with all of your test samples (plus one of my own) and allows you to escape quotes, dots, and spaces.
Due to the requirement of handling escape codes, a split is not really possible.
Although one can imagine a regex that matches the entire string with '()' to mark the separate elements, I was unable to get it working using preg_match
or preg_match_all
.
Instead I parsed the string incrementally, pulling off one element at a time. I then use stripslashes
to unescape quotes, spaces, and dots.
<?php
$strings = array
(
'serviceHits."test_server"."http_test.org" 31987',
'test.2 3',
'test."342".cake.2 "cheese"',
'test."red feet".3."green" 4',
'test."a \\"b\\" c" "cheese face"',
'test\\.one."test\\"two".test\\ three',
);
foreach ($strings as $string)
{
print"'{$string}' => " . print_r(parse_oid($string), true) . "\n";
}
/**
* parse_oid parses and OID and returns an array of the parsed elements.
* This is an all-or-none function, and will return NULL if it cannot completely
* parse the string.
* @param string $string The OID to parse.
* @return array|NULL A list of OID elements, or null if error parsing.
*/
function parse_oid($string)
{
$result = array();
while (true)
{
$matches = array();
$match_count = preg_match('/^(?:((?:[^\\\\\\. "]|(?:\\\\.))+)|(?:"((?:[^\\\\"]|(?:\\\\.))+)"))((?:[\\. ])|$)/', $string, $matches);
if (null !== $match_count && $match_count > 0)
{
// [1] = unquoted, [2] = quoted
$value = strlen($matches[1]) > 0 ? $matches[1] : $matches[2];
$result[] = stripslashes($value);
// Are we expecting any more parts?
if (strlen($matches[3]) > 0)
{
// I do this (vs keeping track of offset) to use ^ in regex
$string = substr($string, strlen($matches[0]));
}
else
{
return $result;
}
}
else
{
// All or nothing
return null;
}
} // while
}
This generates the following output:
'serviceHits."test_server"."http_test.org" 31987' => Array
(
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
)
'test.2 3' => Array
(
[0] => test
[1] => 2
[2] => 3
)
'test."342".cake.2 "cheese"' => Array
(
[0] => test
[1] => 342
[2] => cake
[3] => 2
[4] => cheese
)
'test."red feet".3."green" 4' => Array
(
[0] => test
[1] => red feet
[2] => 3
[3] => green
[4] => 4
)
'test."a \"b\" c" "cheese face"' => Array
(
[0] => test
[1] => a "b" c
[2] => cheese face
)
'test\.one."test\"two".test\ three' => Array
(
[0] => test.one
[1] => test"two
[2] => test three
)
Upvotes: 1
Reputation: 3156
I agree this can be hard to find one regexp to resolve this issue.
Here's a complete solution :
$results = array();
$str = 'serviceHits."test_\"server"."http_test.org" 31987';
// Encode \" to something else temporary
$str_encoded_quotes = strtr($str,array('\\"'=>'####'));
// Split by strings between double-quotes
$str_arr = preg_split('/("[^"]*")/',$str_encoded_quotes,-1,PREG_SPLIT_DELIM_CAPTURE);
foreach ($str_arr as $substr) {
// If value is a dot or a space, do nothing
if (!preg_match('/^[\s\.]$/',$substr)) {
// If value is between double-quotes, it's a string
// Return as is
if (preg_match('/^"(.*)"$/',$substr)) {
$substr = preg_replace('/^"(.*)"$/','\1',$substr); // Remove double-quotes around
$results[] = strtr($substr,array('####'=>'"')); // Get escaped double-quotes back inside the string
// Else, it must be splitted
} else {
// Split by dot or space
$substr_arr = preg_split('/[\.\s]/',$substr,-1,PREG_SPLIT_NO_EMPTY);
foreach ($substr_arr as $subsubstr)
$results[] = strtr($subsubstr,array('####'=>'"')); // Get escaped double-quotes back inside string
}
}
// Else, it's an empty substring
}
var_dump($results);
Tested with all of your new string examples.
First attempt (OLD)
Using preg_split :
$str = 'serviceHits."test_server"."http_test.org" 31987';
// -1 : no limit
// PREG_SPLIT_NO_EMPTY : do not return empty results
preg_split('/[\.\s]?"[\.\s]?/',$str,-1,PREG_SPLIT_NO_EMPTY);
Upvotes: 3
Reputation: 7157
The easiest way is probably to replace dots and spaces inside strings with placeholders, split, then remove the placeholders. Something like this:
$in = 'serviceHits."test_server"."http_test.org" 31987';
$a = preg_replace_callback('!"([^"]*)"!', 'quote', $in);
$b = preg_split('![. ]!', $a);
foreach ($b as $k => $v) $b[$k] = unquote($v);
print_r($b);
# the functions that do the (un)quoting
function quote($m){
return str_replace(array('.',' '),
array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'), $m[1]);
}
function unquote($str){
return str_replace(array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'),
array('.',' '), $str);
}
Upvotes: 2