Vitaliy Vostrikov
Vitaliy Vostrikov

Reputation: 631

Parsing html and decoding symbols with Dart

I try parsing html from url. Charset this page is "windows-1251", and output content from my method coded is utf.

I try use http and html packages like this:

getContentFrom(String uri, {List selectors}) async {
    var htmlForParse, content, html;

    Map headers = {'Content-type':'text/html', 'charset':'windows-1251'};

    htmlForParse = await http.read(uri, headers: headers);

    html = parse(htmlForParse, encoding:'utf-8');
}

http.read return:

'<a href="#"><img src="#" alt="Ðîáîò Parrot Jumping Sumo (÷åðíûé)"/></a>'

It's only when it try get from not utf charset page.

Dart code like that: https://github.com/Rasarts/mini.parser/blob/master/lib/parser.dart

And finally output like so "Parrot Jumping Sumo (÷åðíûé)", but i expected "Parrot Jumping Sumo (черный)"

What can i do for fix that ÷åðíûé ?

Upvotes: 1

Views: 977

Answers (1)

Vitaliy Vostrikov
Vitaliy Vostrikov

Reputation: 631

I made a small function for this purpose: https://github.com/Rasarts/mini.parser/blob/master/lib/cp1251.dart

Upvotes: 5

Related Questions