Reputation: 531
I m trying to curl this page and put the result in a HTML page. I used this code:
$url= "https://web.archive.org/web/20160202021236/http://www.mpshopfashion.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout in seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow 301 redirection
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0');
$html = curl_exec($ch);
The HTML page that is created looks correct when I open it with a browser but when I try to open this page with an editor , I see text like this :
à¤Ã×èͧ»ÃдѺῪÑè¹ à¤Ã×èͧ»ÃдѺῪÑè¹à¡ÒËÅÕ ÊÃéÍÂ¤Í ÊÃéÍ¢éÍÁ×Í µèÒ§ËÙ ¢Ò»ÅÕ¡-¢ÒÂÊè§
Instead of this
เครื่องประดับแฟชั่น เครื่องประดับแฟชั่นเกาหลี สร้อยคอ สร้อยข้อมือ ต่างหู ขายปลีก-ขายส่ง
Upvotes: 0
Views: 1338
Reputation: 146430
Web sites typically declare their encoding in HTTP headers. Please note Content-Type
in this screenshot from Firefox Developer Tools:
TIS-620 is apparently a common legacy encoding used in Thailand (of course, UTF-8 has obsoleted all other encodings).
You editor should have a setting to select encoding, as well as access to the appropriate fonts and, sure, support for that specific encoding. Here's a screenshot from RJ TextEd:
As fallback option (after all, HTTP headers do not exist outside HTTP) HTML provides <meta>
tags as an alternative to identify the encoding:
<meta http-equiv="Content-Type" content="text/html; charset=windows-874"/>
In this case we can see it doesn't even match HTTP headers.
Once more, it's up to the undisclosed specific editor you are using whether to write logic and implement meta tags checks to identify the encoding. There's simply no universal one-size-fits-all solution that works automagically in all editors ever.
Upvotes: 1
Reputation: 102
It's probably about bad encoding settings on website or even in curl request. What about use some wrapper for curl, which is really hard to set in right way.
I can recommend use Guzzle for this.
https://github.com/guzzle/guzzle
Upvotes: 0