Remco
Remco

Reputation: 357

php query not using UTF-8 charset

I am getting my urls and titles from a post's content, but the titles no longer seem to be UTF-8 and include some funky characters such as "Â" when I echo the result. Any idea why the correct charset isn't being used? My headers do use the right metadata.

I tried some of the solutions on here, but none seems to work so I thought I'd add my code below - just in case I'm missing something.

$servername = "localhost";
$database = "xxxx";
$username = "xxxxx";
$password = "xxxx";
$conn = mysqli_connect($servername, $username, $password, $database);


$post_id = 228;

$content_post = get_post($post_id);
$content = $content_post->post_content;
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);

$links = $doc->getElementsByTagName('a');


$counter = 0;
foreach ($links as $link){

$href = $link->getAttribute('href');
$avoid  = array('.jpg', '.png', '.gif', '.jpeg');

if ($href == str_replace($avoid, '', $href)) {

$title = $link->nodeValue;
$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');



$sql = "INSERT INTO wp_urls_download (title, url) VALUES ('$title', '$href')";
if (mysqli_query($conn, $sql)) {
$counter++;
echo "Entry" . $counter . ": $title" . "<br>";

} else {
echo "Error: " . $sql . "<br>" . mysqli_error($conn);
}

}

}

Updated Echo string - changed this after I initially uploaded the code. I have already tried the solutions in the other posts and was not successful.

Upvotes: 0

Views: 1329

Answers (2)

Rick James
Rick James

Reputation: 142208

It seems that you have "double-encoding". What you expected was

Transverse Abdominis (TVA)

But what you have for the space before the parenthesis is a special space that probably came from Microsoft Word, then got converted to utf8 twice. In hex: A0 -> c2a0 -> c382c2a0.

Yes, the link to "utf8 all the way through" would ultimately provide the fix, but I think you need more help.

The A0 was converted from latin1 to utf8, then treating those bytes as if they were latin1 and repeating the conversion.

The connection provide the client's encoding via mysqli_obj->set_charset('utf8') (or similar).

Then the column in the table should be CHARACTER SET utf8mb4 (or utf8). Verify with SHOW CREATE TABLE. (It is probably latin1 currently.)

HTML should start with <meta charset=UTF-8>.

Trouble with UTF-8 characters; what I see is not what I stored

Upvotes: 1

Michael Tijhuis
Michael Tijhuis

Reputation: 173

Did you try to set the utf8 charset on the connection?

$conn->set_charset('utf8');

For more information: http://php.net/manual/en/mysqli.set-charset.php

Upvotes: 2

Related Questions