konr
konr

Reputation: 2565

How can I guess the charset of an html document?

Some malformed and incomplete HTML pages have no charset information assigned to them, and I have to figure out how to display them. Since there are dozens of encoding systems, I wonder if there is an algorithm I can use to correctly perform this task. Is there such thing?

Thanks!

Upvotes: 0

Views: 85

Answers (1)

Zimbabao
Zimbabao

Reputation: 8240

Try jchardet or chsdet. Character set detection is probabilistic so it may go wrong in some cases, I have used jchardet with success few years back.

Upvotes: 1

Related Questions