Arne Cordes
Arne Cordes

Reputation: 581

PHP parse .ini file problems with newlines / need regex?

I have some trouble with parsing .ini files which have values not enclosed by quotes and some newlines in it. Here is an example:

[Section1]
ID=xyz

# A comment
Foo=BAR

Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Screenshot=url-goes-here.png
Categories=some,categories

Vendor=abc

[Section2]
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,

 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Somekey=somevalue

When I try to parse this string with parse_ini_string($file_content, true, INI_SCANNER_RAW);, it returns either false or returns just the first line of Description. E. g.

["Description"]=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod" // next lines are missing

I already tried to remove the newlines and enclose the values in quotes, but can´t find a regex that works. I need a pattern that matches each key/value until the next key/value or a until a comment begins.

Unfortunately sometimes the key begins after a blank line, sometimes not. And values can have blank lines in it (look at Description in Section2).

So the question is, how do I modify/cleanup this string to be readable with parse_ini_string?

Upvotes: 3

Views: 664

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89574

You can describe a multiline key/value with this pattern:

/^\w+=\N*(?:\R++(?!\w+=|[[#;])\N+)+/m

The INI_SCANNER_NORMAL default option allows multiline values enclosed between quotes, so all you need is to add quotes:

$content = preg_replace('~^\w+=\K\N*(?:\R++(?!\w+=|[[#;])\N+)+~m', '"$0"', $content);

pattern details:

~                  # pattern delimiter
^                  # start of the line
\w+                # key name
=
\K                 # discards characters on the left from the match result
\N*                # zero or more characters except newlines
(?:                # non-capturing group: eventual empty lines until a non empty line
    \R++           # one or more newlines
    (?!\w+=|[[#;]) # not followed by another key/value, a section or a comment
    \N+            # one or more characters except newlines
)+                 # at least one occurence
~m                 # switch on the multiline mode, ^ means "start of the line"

This pattern targets only multiline values, other values stay unquoted.

Notes: I assumed that each key, comment, section start at the beginning of a line. If it isn't the case with for example leading spaces, you can easily adapt the pattern adding \h*+ after each newline.

If comments are allowed anywhere in a line, change \N to [^#\r\n]


If you want to use the INI_SCANNER_RAW option, you must remove newlines in values:

$pattern = '~(?:\G(?!\A)|^\w+=[^#\r\n]*)\K\R++(?!\w+=|[[#])([^#\r\n]+)~';
$content = preg_replace($pattern, ' $1', $content);

The pattern matches groups of consecutive newlines character followed by a non empty line one by one and replace consecutive newlines with a space.

An other way to do it is to use the first pattern but this time with preg_replace_callback to perform a simple character translation in the callback function. Note that this way may be interesting if you want to escape special or problematic characters.

Upvotes: 3

Related Questions