Mikhael Djekson
Mikhael Djekson

Reputation: 183

preg_match UTF-8 problems unknown symbols instead of Cyrillic

my script work great, but today after checkin logs i found some matrix words, after analysing i understood that there is something with utf8, files are parsed, title is extracted, but result instead of russian words is (Сериалы ТУТ! СериÐ) unknown symbols

i use

$cont = "dasdas<title>Сериалы ТУТ! Сериалы онлайн sda</title>";
preg_match("'<title[^>]*?>(.*)</title>'siU", $cont, $match);

//$match[1] = Сериалы ТУТ! СериРsda

when i try to add pattern modifier /u there is no changes, the same unknown matrix words. Please.

Maybe there is something with PHP?

Upvotes: 0

Views: 154

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

It is not a php or a regex problem, but an html problem. To obtain a correct display, you must add <meta charset="UTF-8"/> in the header of your html code.

As an aside comment: using the U modifier is useless:

preg_match('~<title[^>]*>(.*?)</title>~si', $cont, $match);

Upvotes: 2

Related Questions