Vishal Khialani
Vishal Khialani

Reputation: 2587

encoding issues in drupal when importing from wordpress

I am currently moving blog posts from wordpress to drupal. however after moving it some of the text is not being displayed correctly.

wordpress is displaying : When it hasn’t (html code is <h2>When it hasn’t</h2>)

Drupal is displaying : When it hasn’t (html code is <h2>When it hasn’t</h2>)

In the wordpress and drupal db the value is correct. The source is the same. <h2>When it hasn’t</h2>

I did a search and found many options. None of them helped. Below are the ones I have done and checked.

1) I double checked that utf-8 is the character encoing in drupal and wp. I also made a simple test.php file to check nothing else was coming in the way and it still did not display correctly.

2) I made sure when we take a mysqldump and upload to drupal utf-8 is used.

3) I also made sure the .php file is in utf-8 when saved.

4) I changed the encoding type in chrome for every option available and nothing displayed it correctly.

5) I also used php functions to recode it but they did not work.

$value2="<h2>When it hasn’t</h2>";

$out = recode_string('..utf-8', $value2);
//output - When it hasnt

$out2= mb_convert_encoding($value2,'UTF-8', "UTF-8");
// output  - When it hasn’t


$out3= @iconv('UTF-8', 'utf-8', $value2);
// output - When it hasn’t

I have ran out of options now and I am stuck. Please help

Upvotes: 1

Views: 196

Answers (1)

Raffaele
Raffaele

Reputation: 20885

You say the text in both databases is correct, but actually this doesn't mean too much: to viewing the content of a record you must use some client, and quite a few transformations may happen depending on how the text is rendered so you can read it.

So only two things matters:

  1. the encoding of the column
  2. the encoding of the HTML page returned by Drupal

Since your page outputs ’ (in CP1252 is xE2x80x99) for (Unicode U+2019, UTF-8 is 0xE28099) I guess the column is indeed UTF-8, however there's someone between the database and the browser who thinks the text is CP1252. This is what you have to check:

  • If using MySQL, the connection encoding must be UTF-8 so that what you have in your PHP script is UTF-8 text. You can use SET NAMES 'UTF-8'. Note that if you don't need the Unicode set, you can even use CP1252: the only important thing is that you know the encoding, since PHP strings are just byte arrays.
  • Explicitely define the response encoding in the HTTP Content-Type header. I mean, configure Drupal to call header('Content-Type: text/html; charset=utf-8');
  • If the HTTP response encoding is different than the one used for the text retrieved from the db, transcode the query result accordingly

Upvotes: 3

Related Questions