preg_match fails to find simple regex

Question

Due to some NDAs, the amount of information I can really disclose here is small. Unfortunately, nobody where I am has an answer for me, so I'm turning to Stack Overflow. The basics are this: in PHP, I am downloading a large-ish file (73000 characters) from an SVN repository using HTTP (either with cURL or file_get_contents), and searching for rules. All the rules are annotated with @rule, so the regex to find them ought to be

/(?<=@RULE).+?$/im

I've tested it, it works. Problem is, even though the file is downloading properly and being converted to a string (var_dumps have ensured this)

preg_match('/RU/',$file, $rules);

leaves $rules completely empty, despite the fact that I can SEE the appropriate matches in the var_dumped strings. I'm at my wit's end trying to figure out what's going on. No errors are being thrown (it returns 0), it doesn't seem to be running out of memory, it just tells me "Nope, nothing in there, George." Interestingly, it will find

/R/

just fine. Any ideas out there?

Ja͢ck · Accepted Answer

Since you're only matching ASCII, the only thing I can think of is that the text format is in UTF-16 which, in the case of ASCII, adds a '\0' after each character.

If that's the case, before running preg_match() you run this:

$file = mb_convert_encoding($file, 'UTF-8', 'UTF-16');

preg_match fails to find simple regex

Answers (1)

Related Questions