Reputation: 85
I'm having a strange issue with preg_replace
. It seems to behave normally with single strings, but when I bring a large text file (~1.5MB) it seems to just do nothing.
I'm trying to parse a large text file of key-values which has this kind of structure:
"KeyValues"
{
"Key1" "Value1"
// a comment
"ComplexKey"
{
"ComplexKey1" "ComplexValue1" // another comment
"ComplexKey2" "ComplexValue2"
"FurtherComplexity1"
{
"ComplexKey3" "ComplexValue3"
"ComplexKey4" "ComplexValue4"
}
}
}
I'm trying to remove the comments from the text-file before I do any parsing. preg_replace
seemed like a safe bet. Here's the code for just removing the comments:
<?php
$filecontent = file_get_contents('file.txt');
$filecontent = preg_replace('!//.*!s', '', $filecontent);
echo $filecontent;
?>
Now I expect it to output the example above without the comments, but it just returns the exact same string it starts out with. Where it gets weird though is it I take a single line out from the text file, for example this one:
"ComplexKey1" "ComplexValue1" // another comment
I can run the preg_replace
call on that string and it'll return the string without the comment. I thought maybe it was because there was some new-line characters not matching the regular expression, so I added the 's' modifier to the expression; however, this didn't seem to fix the problem. For whatever reason, my preg_replace
call just won't do anything (or my regular expression is off).
An obvious solution would be just to ignore comments in the parsing, but I'm thinking there must be something I'm missing here as to why this isn't working. I'd really like to fix this without changing the parser, if at all possible. Any ideas?
Upvotes: 1
Views: 3382
Reputation: 11211
This looks like JSON
Can you use
json_decode($mydata,true)
to turn your whole text file into a nice multi level php array.
Upvotes: -1
Reputation: 255035
$filecontent = preg_replace('!//.*$!m', '', $filecontent);
m
modifier changes the processing of the input text to line by line
By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Upvotes: 3