Reputation: 795
Im building an application that interacts with the Twitter API.
So far my code handles the responses correctly and I am happy with the way i am interacting with search API. I am however stuck when it comes to the actual content from the Twitter API responses.
Right now, i search for tweets with specific hastags using the atom feed, i.e.
$url = 'http://search.twitter.com/search.atom?q='.urlencode($hash_tag) ;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
$xml = curl_exec ($ch);
curl_close ($ch);
$twelement = new SimpleXMLElement($xml);
echo "<pre>";
foreach ($twelement->entry as $entry) {
echo($entry->author->name);
echo '<br />';
echo mb_detect_encoding($entry->author->name);
echo '<br />';
I have been trying different php functions to decode/convert to the correct character encoding, but no matter what i do, i always end up with the wrong output.
My output from this code is : (crossed out for privacy)
xxxxxx (xxxxx xxxxxxx)
ASCII
xxxx_xxxxx (Chinny ♥_♥)
UTF-8
kunlemyk ((˘̯˘ ) hardekhunley™)
UTF-8
xxxx_xxxxx (♥ify okwuosa♥)
UTF-8
xxx_xxxx (Call me DRO)
ASCII
Why are some ASCII and some UTF-8? how can i ensure they are consistent. can i convert them to ascii? im pretty lost here. I have been stuck on this for ages and would really appreciate some help here.
Regards,
Andrew
Upvotes: 2
Views: 1189
Reputation: 31813
utf8 was specifically designed so that ascii was a proper subset of it. This was done for backwards compatibility.
a function that detects an encoding, usually does so by educated guessing after inspecting the byte values. If the string in question contains nothing but ascii characters, it could be called either ascii, or utf8. Again, this is because an ascii string is a valid utf8 string by design.
It makes more sense to call a pure ascii string "ascii", because it is more specific, and when guessing, you only really know for sure that it's ascii if all you've encountered was ascii chars. If there was at least one utf8 character in the string, and the rest were ascii, the func should detect it as utf8. But without seeing at least one utf8 char, it would be wrong to call the string utf8.
edit- as for what to do about it? Again, an ascii string is a valid utf8 string, so you should just use utf8 as that will work for both types. make sure to declare this via a real http header, not a <meta
tag.
header('content-type:text/html;charset=utf-8');
Upvotes: 2
Reputation: 354
Take a loot at this post.
You might want to search for methods to detect encoding.
Upvotes: 0