tresstylez
tresstylez

Reputation: 1839

Need Regular Expression to match name/value pairs in different formats

I'm pulling in ASP/VBScript configuration files via PHP Curl to do some file processing and want to return some values.

The strings look like this:

config1 = ""  
config2 = "VALUE:1:0:9" 'strange value comment 
otherconfig = False 
yetanotherconfig = False 'some comment

Basically, its name value pairs separated by equal signs, with a value optionally enclosed within quotation marks followed optionally by a comment.

I want to return the actual VALUE (False, VALUE:1:0:9, etc..) in ONE matching group regardless of the format the string is in.

Here's the pattern i'm passing to preg_match so far:

$pattern = '/\s*'.$configname.'\s*\=\s*(\".*?\"|.*?\r)/'

$configname is the name of the specific configuration i'm looking for, so I pass it in with a variable.

I'm still getting parentheses included back with the value (instead of the value itself) and i'm getting comments returned with the value as well.

Any help is appreciated!

Upvotes: 0

Views: 755

Answers (3)

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99921

This one will work:

$pattern = '/
    \s*
    # name
    (?P<name>.*?)
    # =
    \s*=\s*
    # value
    (?P<val>
        "(?P<quoted>([^"]|\\\\"|\\\\\\\\)*)"
        |(?P<raw>.*?)
    )
    # comment
    \s*(?P<comment>\'.*)?
$/xm';

This will match every key=value pair in the input string, instead of just a specific one.

The regex takes care for quotes and escaped quotes (\") in quoted values (e.g. "foo\"bar").

Use it with a function like this:

function parse_config($string) {
    $pattern = '/
        \s*
        # name
        (?P<name>.*?)
        # =
        \s*=\s*
        # value
        (?P<val>
            "(?P<quoted>([^"]|\\\\"|\\\\\\\\)*)"
            |(?P<raw>.*?)
        )
        # comment
        \s*(?P<comment>\'.*)?
    $/xm';

    preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);

    $config = array();
    foreach($matches as $match) {
        $name = $match['name'];
        if (!empty($match['quoted'])) {
            $value = str_replace(array('\\"','\\\\'), array('"','\\'), $match['quoted']);
        } else if (isset($match['raw'])) {
            $value = $match['raw'];
        } else {
            $value = '';
        }
        $config[$name] = $value;
    }

    return $config;
}

Example:

$string = "a = b\n
c=\"d\\\"e\\\\fgh\" ' comment";

$config = parse_config($string);

// output:

array('a' => 'b', 'c' => 'd"e\fgh');

Other example:

$string = <<<EOF
config1 = ""
config2 = "VALUE:1:0:9" 'strange value comment
otherconfig = False
yetanotherconfig = False 'some comment
EOF;

print_r(parse_config($string));

// output:

Array
(
    [config1] => 
    [config2] => VALUE:1:0:9
    [otherconfig] => False
    [yetanotherconfig] => False
)

Upvotes: 0

Pierre
Pierre

Reputation: 1343

Returning matching value in ONE matching group if difficult because of the double quotes alternative. Back references can help:

$pattern = '/\s*'.$configname.'\s*=\s*("?)(?<value>.*?)\1\s*[\'$]/'

should do the trick. Then use $result['value'].

Explained in english it does:

  • I skip the spaces identifier spaces = spaces (easy)
  • may match a " referenced as \1 (the first capture parenthesis)
  • match any char not greedily referenced as value
  • match \1 (so " if there was one before, or nothing if not)
  • may match some spaces
  • must match a starting comment ' or an end of line

Without back references:

$pattern = '/\s*'.$configname.'\s*=\s*(?:"(.*?)"|(.*?)\s*[\'$])/'

More efficient but value is in $result[1] or $result[2].

Understand your mistakes:

  • You need \ only to protect the string quote itself (here simple quote) or to avoid a preg reserved char to be interpreted (as ., ^, $ ...)
  • End of line is marked as $, not \r or \n
  • You never avoided the commentary

Upvotes: 1

Lee Louviere
Lee Louviere

Reputation: 5262

\r is going to match a CR character (carriage return). You're essentially saying I want to match "???????" or ????????[carriage return]

Of course you'll get the apostrophe, you've matched it. You'll have to strip these things off.

patter = '/\s*'.$configname.'\s*\=\s*(\")(.*?)(?(1)\"|)\s*/'

Upvotes: 0

Related Questions