Gooseman
Gooseman

Reputation:

CURL import character encoding problem

I'm using CURL to import some code. However, in french, all the characters come out funny. For example: Bonjour ...

I don't have access to change anything on the imported code. Is there anything I can do my side to fix this?

Thanks

Upvotes: 5

Views: 22036

Answers (5)

rmontagud
rmontagud

Reputation: 153

I'm currently suffering a similar problem, i'm trying to write a simple html <title> importer cia cURL. So i'm going to give an idea of what i've done until now:

  1. Retrieve the HTML via cURL
  2. Check if there's any hint of encoding on the response headers via curl_getinfo() and match it via regex
  3. Parse the HTML for the purpose of looking at the content-type meta and the <title> tag (yes, i know the consequences)
  4. Compare both content-type, header and meta and choose the meta one if it's different, because we know noone cares about their httpd configuration and there are a lot of dirt workarounds using it
  5. iconv() the string
  6. Whish everyday that when someone does not follow the standards $DEITY punishes him/her until the end of the days, because it would save me the meta parsing

Upvotes: 2

Rid Iculous
Rid Iculous

Reputation: 3952

I had a similar problem. I tried to loop through all combinations of input and output charsets. Nothing helped! :(

However I was able to access the code that actually fetched the data and this is where the culprit lied. Data was fetched via cURL. Adding

 curl_setopt($ch,CURLOPT_BINARYTRANSFER,true);

fixed it.

A handy set of code to try out all possible combinations of a list of charsets:

$charsets = array(  
        "UTF-8", 
        "ASCII", 
        "Windows-1252", 
        "ISO-8859-15", 
        "ISO-8859-1", 
        "ISO-8859-6", 
        "CP1256"
        ); 

foreach ($charsets as $ch1) { 
    foreach ($charsets as $ch2){ 
        echo "<h1>Combination $ch1 to $ch2 produces: </h1>".iconv($ch1, $ch2, $text_2_convert); 
    } 
} 

Upvotes: 7

Ben
Ben

Reputation: 55

You could replace your

$data = curl_exec($ch);

by

$data = utf8_decode(curl_exec($ch));

I had this same issue and it worked well for me.

Upvotes: 3

Judder
Judder

Reputation:

PHP seems to use UTF-8 by default, so I found the following works

$text = iconv("UTF-8","Windows-1252",$text);

Upvotes: 3

Alekc
Alekc

Reputation: 4770

Like Jon Skeet pointed it's difficult to understand your situation, however if you have access only to final text, you can try to use iconv for changing text encoding.

I.e.

$text = iconv("Windows-1252","UTF-8",$text);

I've had similar issue time ago (with Italian language and special chars) and I've solved it in this way.

Try different combination (UTF-8, ISO-8859-1, Windows-1252).

Upvotes: 14

Related Questions