Reputation: 143
I have a method ReadJsonUrl which gets a url(string address (for the example: https://www.ah.nl/service/rest/delegate?url=%2Fproducten%2Fproduct%2Fwi224732%2Fsmiths-nibb-it-happy-ones-kruis-rond-paprika )) to a JSON file.
This method reads the JSON and outputs some of the data in the console.
But the problem is that the description from the product is outputted like
Smiths Nibb-it hap-py on-es kruis-rond pa-pri-ka
but if I check the JSON in my browser it shows
Smiths Nibb-it happy ones kruis-rond paprika
and that's how I want it to print.
What I think the problem is, the request is done with a 0px by 0px resolution browser so it returns the words divided to keep it readable. If I make my browser really small then it also shows the description with the dashes. I added an user agent in my code, but that didn't work.
Does anyone have an idea how to fix this?
My code:
public static async Task<object> ReadJsonUrl(string address)
{
using (HttpClient client = new HttpClient())
{
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36");
HttpResponseMessage response = await client.GetAsync(address);
var content = await response.Content.ReadAsStringAsync();
//JObject obj = JObject.Parse(content);
var data = Empty.FromJson(content);
var product = data.Embedded.Lanes[4].Embedded.Items[0].Embedded.Product;
Console.WriteLine(product.Id);
Console.WriteLine(product.Description);
Console.WriteLine(product.PriceLabel.Now);
Console.WriteLine(product.Availability.Label);
Console.WriteLine("-------------------------------------");
System.Threading.Thread.Sleep(5000);
//the return value is for later use
return product;
}
}
Upvotes: 2
Views: 446
Reputation: 59410
If you copy&paste your second string (expected output) into a hex editor, it will tell you that it has 0xAD
characters. These are soft hyphens.
A browser like Internet Explorer or Firefox will only display those soft hyphens if necessary (at a line break), but the Console displays it every time.
Upvotes: 8
Reputation: 16991
To supplement Thomas Weller's answer, that explains the problem very well, here is a function that will strip out all the soft hyphens from a string
. It is written as an extension method, so you can use it easily like this:
Console.WriteLine(product.Description.RemoveSoftHyphens());
The extension method:
public static class StringExtensions
{
public static string RemoveSoftHyphens(this string input)
{
var output = new StringBuilder(input.Length);
foreach (char c in input)
{
if (c != 0xAD)
{
output.Append(c);
}
}
return output.ToString();
}
}
As a bit of additional information, here is HTML4's description of the use of soft hyphens:
In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics. If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.
Upvotes: 3