Reputation: 522
I need to read an XML file from the web with the encoding ISO-8859-1. After creating a XmlDocument with it I have tried to convert some InnerText of it to UTF. But that didn't work. Then I have tried to change the encoding on the HttpClient. The response string is properly formatted but when creating the XmlDocument, the app crashes with exception: HRESULT: 0xC00CE55F or with non expected characters on the XML string. How can I solve this issue?
Code Snippet:
private static async Task<string> GetResultsAsync(string uri)
{
var client = new HttpClient();
var response = await client.GetByteArrayAsync(uri);
var responseString = Encoding.GetEncoding("iso-8859-1").GetString(response, 0, response.Length - 1);
return responseString;
}
public static async Task GetPodcasts(string url)
{
var progrmas = await GetGroupAsync("prog");
HttpClient client = new HttpClient();
//Task<string> pedido = client.GetStringAsync(url);
//string res = await pedido; //Gets the string with the wrong chars, LoadXml doesn't fails
res = await GetResultsAsync(url); //Gets the string properly formatted
XmlDocument doc = new XmlDocument();
doc.LoadXml(res); //Crashes here
XmlElement root = doc.DocumentElement;
XmlNodeList nodes = root.SelectNodes("//item");
//Title
var node_titles = root.SelectNodes("//item/title");
IEnumerable<string> query_titles = from nodess in node_titles select nodess.InnerText;
List<string> list_titles = query_titles.ToList();
//........
for (int i = 0; i < list_titles.Count; i++)
{
PodcastItem podcast = new PodcastItem();
string title = list_titles[i];
//First attempt to convert a field from the XmlDocument, with the wrong chars. Only replaces the bad encoding with a '?':
//Encoding iso = Encoding.GetEncoding("ISO-8859-1");
//Encoding utf8 = Encoding.UTF8;
//byte[] utfBytes = utf8.GetBytes(title);
//byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
//string msg = iso.GetString(isoBytes, 0, isoBytes.Length - 1);
PodcastItem dataItem = new PodcastItem(title + pubdate, title, link, description, "", pubdate);
progrmas.Items.Add(dataItem);
}
}
Upvotes: 0
Views: 2337
Reputation: 42414
I'm not sure why you try to fiddle with the encoding your self but the reason it crashes so badly on you is probably because you forgot to fetch the last byte of the array. This code works for me:
static async Task<string> LoadDecoced()
{
var client = new HttpClient();
var response = await client.GetByteArrayAsync("http://www.rtp.pt/play/podcast/469");
var responseString = Encoding
.GetEncoding("iso-8859-1")
.GetString(response, 0, response.Length); // no -1 here, we want all bytes!
return responseString;
}
If I let the HttpClient figure it out your code works for me:
static async Task<string> Load()
{
var hc = new HttpClient();
string s = await hc.GetStringAsync("http://www.rtp.pt/play/podcast/469");
return s;
}
static void Main(string[] args)
{
var xd = new XmlDocument();
string res = Load().Result;
xd.LoadXml(res);
var node_titles = xd.DocumentElement.SelectNodes("//item/title");
Console.WriteLine(node_titles.Count);
}
If you are on a non-mobile/non-WinRT the XmlDocument.Load accepts a stream does the same:
static async Task<Stream> LoadStream()
{
var hc = new HttpClient();
var stream = await hc.GetStreamAsync("http://www.rtp.pt/play/podcast/469");
return stream;
}
static void Main(string[] args)
{
var xd2 = new XmlDocument();
xd2.Load(LoadStream().Result);
var node_titles2 = xd2.DocumentElement.SelectNodes("//item/title");
Console.WriteLine(node_titles2.Count);
}
This is the result in my Console:
Are you sure you are not encoding somewhere else as well?
As a general advice: The framework classes are capable of handling most common encoding scenario's. Try to let it work without having to fiddle with the Encoding classes.
Upvotes: 1