Reputation: 2412
I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem. My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server). What is the missing thing in that code and all spanish letters look weird. PS. When I wrote these weird characters original versions into php, they all look accurate.
header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
$ch=curl_init();
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
$data=curl_exec($ch);
curl_close($ch);
return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);
Upvotes: 34
Views: 140180
Reputation: 437
You Can use this header
header('Content-type: text/html; charset=UTF-8');
and after decoding the string
$page = utf8_decode(curl_exec($ch));
It worked for me
Upvotes: 16
Reputation: 4288
The best way I have tried before is to use urlencode()
. Keep in mind, don't use it for the whole url; instead, use it only for the needed parts. For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.
However, there are better ways if the range of characters have to be encoded is more limited. One of these ways is using CURLOPT_ENCODING
, by passing it to curl_setopt()
:
curl_setopt($ch, CURLOPT_ENCODING, "");
Upvotes: 3
Reputation: 319
$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);
Upvotes: 4
Reputation: 5215
I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch));
returned UTF-8. Tried utf8_encode(curl_exec($ch));
and the characters were correct.
Upvotes: 3
Reputation: 93
function page_title($val){
include(dirname(__FILE__).'/simple_html_dom.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$val);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$return = curl_exec($ch);
$encot = false;
$charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
curl_close($ch);
$html = str_get_html('"'.$return.'"');
if(strpos($charset,'charset=') !== false) {
$c = str_replace("text/html; charset=","",$charset);
$encot = true;
}
else {
$lookat=$html->find('meta[http-equiv=Content-Type]',0);
$chrst = $lookat->content;
preg_match('/charset=(.+)/', $chrst, $found);
$p = trim($found[1]);
if(!empty($p) && $p != "")
{
$c = $p;
$encot = true;
}
}
$title = $html->find('title')[0]->innertext;
if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);
return $title;
}
Upvotes: 3
Reputation: 406
Simple:
When you use curl it encodes the string to utf-8
you just need to decode them..
Description
string utf8_decode ( string $data )
This function decodes data , assumed to be UTF-8
encoded, to ISO-8859-1
.
Upvotes: 39