Reputation: 4321
I'm attempting to retrieve a remote HTML page with cURL - however, when I analyze the text that gets returned, I'm noticing alot of odd characters like ▀Ã
, which makes me think that something went wrong with the text encoding somewhere along the line.
How can I ensure that the text I get back from cURL is properly encoded, and how can I normalize it so I can safely store results in a database without any encoding issues?
Upvotes: 1
Views: 2309
Reputation: 9
You need to include the following on the top of your page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Upvotes: -1
Reputation: 5147
I hope you have set CURLOPT_ENCODING to "" and the page is not full of those gibberish which you see, second thing I can suggest is to run the string through some thing like html entities to sanitise it. Curl simply gets/posts the data and, IMHO, doesn't change the encodings
Upvotes: 5