nor0x
nor0x

Reputation: 1213

Deserialize JSON encoded HTML code with System.Text.Json

I'm using C# to call a REST API which returns a JSON object containing HTML code. Here is an example of the object I'm interested in

{    
    "Body": "<html class=\"sg-campaigns\"><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1\"><meta content=\"IE=Edge\"><style type=\"text/css\">\r\n<!--\r\nbody ..."
}

I would like to further process the HTML code but since it contains various elements to be a valid JSON string deserialization with System.Text.Json fails with the following exception

System.Text.Json.JsonReaderException: '<' is an invalid start of a value.

I have tried using the following code to deserialize the content of the Body attribute

var options = new JsonSerializerOptions()
{
    Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    WriteIndented = true
};

var content = JsonSerializer.Deserialize<String>(html, options);

The elements causing errors are for example:

I'm curious to learn how the Body attribute from the code above can be cleaned to only contain valid HTML, maybe someone from the community has an idea about this.

Upvotes: 0

Views: 2630

Answers (1)

Ben Osborne
Ben Osborne

Reputation: 1542

Create a simple object that contains a Body property (and any other properties you're going to use):

internal class ResponseObject
{
    public string Body { get; set; }
}

Then deserialize the response JSON to that type of object instead of String.:

var content = JsonSerializer.Deserialize<ResponseObject>(html, options);

Content.Body will contain the decoded HTML.

Upvotes: 1

Related Questions