Reputation: 581
I have some trouble with parsing .ini files which have values not enclosed by quotes and some newlines in it. Here is an example:
[Section1]
ID=xyz
# A comment
Foo=BAR
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Screenshot=url-goes-here.png
Categories=some,categories
Vendor=abc
[Section2]
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Somekey=somevalue
When I try to parse this string with parse_ini_string($file_content, true, INI_SCANNER_RAW);
, it returns either false or returns just the first line of Description
. E. g.
["Description"]=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod" // next lines are missing
I already tried to remove the newlines and enclose the values in quotes, but can´t find a regex that works. I need a pattern that matches each key/value until the next key/value or a until a comment begins.
Unfortunately sometimes the key begins after a blank line, sometimes not. And values can have blank lines in it (look at Description
in Section2
).
So the question is, how do I modify/cleanup this string to be readable with parse_ini_string
?
Upvotes: 3
Views: 664
Reputation: 89574
You can describe a multiline key/value with this pattern:
/^\w+=\N*(?:\R++(?!\w+=|[[#;])\N+)+/m
The INI_SCANNER_NORMAL
default option allows multiline values enclosed between quotes, so all you need is to add quotes:
$content = preg_replace('~^\w+=\K\N*(?:\R++(?!\w+=|[[#;])\N+)+~m', '"$0"', $content);
pattern details:
~ # pattern delimiter
^ # start of the line
\w+ # key name
=
\K # discards characters on the left from the match result
\N* # zero or more characters except newlines
(?: # non-capturing group: eventual empty lines until a non empty line
\R++ # one or more newlines
(?!\w+=|[[#;]) # not followed by another key/value, a section or a comment
\N+ # one or more characters except newlines
)+ # at least one occurence
~m # switch on the multiline mode, ^ means "start of the line"
This pattern targets only multiline values, other values stay unquoted.
Notes: I assumed that each key, comment, section start at the beginning of a line. If it isn't the case with for example leading spaces, you can easily adapt the pattern adding \h*+
after each newline.
If comments are allowed anywhere in a line, change \N
to [^#\r\n]
If you want to use the INI_SCANNER_RAW
option, you must remove newlines in values:
$pattern = '~(?:\G(?!\A)|^\w+=[^#\r\n]*)\K\R++(?!\w+=|[[#])([^#\r\n]+)~';
$content = preg_replace($pattern, ' $1', $content);
The pattern matches groups of consecutive newlines character followed by a non empty line one by one and replace consecutive newlines with a space.
An other way to do it is to use the first pattern but this time with preg_replace_callback
to perform a simple character translation in the callback function. Note that this way may be interesting if you want to escape special or problematic characters.
Upvotes: 3