Reputation: 11597
I am using the Yahoo Placefinder service to find some latitude/longitude positions for a list of addresses I have in a csv file.
I am using the following code:
String reqURL = "http://where.yahooapis.com/geocode?location=" + HttpUtility.UrlEncode(location) + "&appid=KGe6P34c";
XmlDocument xml = new XmlDocument();
xml.Load(reqURL);
XPathNavigator nav = xml.CreateNavigator();
// process xml here...
I just found a very stubborn error, that I thought (incorrectly) for several days was due to Yahoo forbidding further requests from me.
It is for this URL:
http://where.yahooapis.com/geocode?location=31+Front+Street%2c+Sedgefield%2c+Stockton%06on-Tees%2c+England%2c+TS21+3AT&appid=KGe6P34c
My browser complains about a parsing error for that url. My c# program says it has a 500 error.
The location string here comes from this address:
Agape Business Consortium Ltd.,[email protected],Michael A Cutbill,Director,,,9 Jenner Drive,Victoria Gardens,,Stockton-on-Tee,,TS19 8RE,,England,85111,Hospitals,www.agapesolutions.co.uk
I think the error comes from the first hyphen in Stockton-on-Tee
, but I can't explain why this is. If I replace this hypen with a 'normal' hyphen, the query goes through successfully.
Is this error due to a fault my end (the HttpUtility.UrlEncode
function being incorrect?) or a fault Yahoo's end?
Even though I can see what is causing this problem, I don't understand why. Could someone explain?
EDIT:
Further investigation on my part indicates that the character this hypen is being encoded to, "%06", is the ascii control character "Acknowledge", "ACK". I have no idea why this character would turn up here. It seems that differrent places render Stockton-on-Tee
in different ways - it appears normal opened in a text editor, but by the time it appears in Visual Studio, before being encoded, it is Stocktonon-Tees
. Note that, when I copied the previous into this text box in firefox, the hypen rendered as a weird, square box character, but on this subsequent edit the SO software appears to have santized the character.
I include below the function & holder class I am using to parse the csv file - as you can see, I am doing nothing strange that might introduce unexpected characters. The dangerous character appears in the "Town" field.
public List<PaidBusiness> parseCSV(string path)
{
List<PaidBusiness> parsedBusiness = new List<PaidBusiness>();
List<string> parsedBusinessNames = new List<string>();
try
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
bool first = true;
while ((line = readFile.ReadLine()) != null)
{
if (first)
first = false;
else
{
row = line.Split(',');
PaidBusiness business = new PaidBusiness(row);
if (!business.bad) // no problems with the formatting of the business (no missing fields, etc)
{
if (!parsedBusinessNames.Contains(business.CompanyName))
{
parsedBusinessNames.Add(business.CompanyName);
parsedBusiness.Add(business);
}
}
}
}
}
}
catch (Exception e)
{ }
return parsedBusiness;
}
public class PaidBusiness
{
public String CompanyName, EmailAddress, ContactFullName, Address, Address2, Address3, Town, County, Postcode, Region, Country, BusinessCategory, WebAddress;
public String latitude, longitude;
public bool bad;
public static int noCategoryCount = 0;
public static int badCount = 0;
public PaidBusiness(String[] parts)
{
bad = false;
for (int i = 0; i < parts.Length; i++)
{
parts[i] = parts[i].Replace("pithawala", ",");
parts[i] = parts[i].Replace("''", "'");
}
CompanyName = parts[0].Trim();
EmailAddress = parts[1].Trim();
ContactFullName = parts[2].Trim();
Address = parts[6].Trim();
Address2 = parts[7].Trim();
Address3 = parts[8].Trim();
Town = parts[9].Trim();
County = parts[10].Trim();
Postcode = parts[11].Trim();
Region = parts[12].Trim();
Country = parts[13].Trim();
BusinessCategory = parts[15].Trim();
WebAddress = parts[16].Trim();
// data testing
if (CompanyName == "")
bad = true;
if (EmailAddress == "")
bad = true;
if (Postcode == "")
bad = true;
if (Country == "")
bad = true;
if (BusinessCategory == "")
bad = true;
if (Address.ToLower().StartsWith("po box"))
bad = true;
// its ok if there is no contact name.
if (ContactFullName == "")
ContactFullName = CompanyName;
//problem if there is no business category.
if (BusinessCategory == "")
noCategoryCount++;
if (bad)
badCount++;
}
}
Upvotes: 1
Views: 356
Reputation: 133995
Welcome to real world data. It's likely that the problem is in the CSV file. To verify, read the line and inspect each character:
foreach (char c in line)
{
Console.WriteLine("{0}, {1}", c, (int)c);
}
A "normal" hyphen will give you a value of 45.
The other problem could be that you're reading the file using the wrong encoding. It could be that the file is encoded as UTF8 and you're reading it with the default encoding. You might try specifying UTF8 when you open the file:
using (StreamReader readFile = new StreamReader(path, Encoding.UTF8))
Do that, and then output each character on the line again (as above), and see what character you get for the hyphen.
Upvotes: 2