Reputation: 1213
I'm using C# to call a REST API which returns a JSON object containing HTML code. Here is an example of the object I'm interested in
{
"Body": "<html class=\"sg-campaigns\"><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1\"><meta content=\"IE=Edge\"><style type=\"text/css\">\r\n<!--\r\nbody ..."
}
I would like to further process the HTML code but since it contains various elements to be a valid JSON string deserialization with System.Text.Json
fails with the following exception
System.Text.Json.JsonReaderException: '<' is an invalid start of a value.
I have tried using the following code to deserialize the content of the Body attribute
var options = new JsonSerializerOptions()
{
Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var content = JsonSerializer.Deserialize<String>(html, options);
The elements causing errors are for example:
\ "
< !--
\r
, \n
, \t
I'm curious to learn how the Body
attribute from the code above can be cleaned to only contain valid HTML, maybe someone from the community has an idea about this.
Upvotes: 0
Views: 2630
Reputation: 1542
Create a simple object that contains a Body
property (and any other properties you're going to use):
internal class ResponseObject
{
public string Body { get; set; }
}
Then deserialize the response JSON to that type of object instead of String
.:
var content = JsonSerializer.Deserialize<ResponseObject>(html, options);
Content.Body
will contain the decoded HTML.
Upvotes: 1