Richard Todd
Richard Todd

Reputation: 2486

Read non-ASCII chars from XML

I've built a small program that reads the XML output from Google Maps API geocode service and parses the string using LINQ to XML.

If the XML returned contains non ASCII characters then my output seems to break. Is there a way to read/encode this differently?

Here is a snapshot of the key part of the code.

    public static void Read(IList<string> LocationDetails, string Type)
    {
        using (WebClient webClient = new WebClient())
        {
            webClient.Proxy = null;

            for(int i = 0; i < 5; i++)
            {
                //Generate geocode request and read XML file to string
                string request = String.Format("Https://maps.google.com/maps/api/geocode/xml?{0}={1}&sensor=false", Type, LocationDetails[i]);
                string locationXML = webClient.DownloadString(request);
                XElement root = XElement.Parse(locationXML);

              //Check if request is OK or otherwise
              if (root.Element("status").Value != "OK")
              {     //Skip to next iteration if status not OK
                 continue;   
              }
            }

..... skip some declaration code. StateName declared as string.

    try
    {
        StateName = (result.Elements("address_component")
         .Where(x => (string)x.Element("type") == "administrative_area_level_1")
         .Select(x => x.Element("long_name").Value).First());
    }
    catch (InvalidOperationException e)
    {
        StateName = null;
    }

Upvotes: 1

Views: 681

Answers (1)

Martin Liversage
Martin Liversage

Reputation: 106826

I believe the Google webservice will return XML encoded using UTF-8. However, if this information is absent from the HTTP header the WebClient.DownloadString method will use Encoding.Default to decode the returned bytes into a string. This is also called the "ANSI" encoding and in most cases is not UTF-8.

To fix this you need to perform the following assignment before calling webclient.DownloadString(request):

webClient.Encoding = Encoding.UTF8;

Upvotes: 3

Related Questions