Geoffrey Hing
Geoffrey Hing

Reputation: 1775

How do I escape closing '/' in HTML tags in JSON with Python?

Note: This question is very close to Embedding JSON objects in script tags, but the responses to that question provides what I already know (that in JSON / == \/). I want to know how to do that escaping.

The HTML spec prohibits closed HTML tags anywhere within a <script> element. So, this causes parse errors:

<script>
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
</script>

In my case, I'm generating the invalid situation by rendering a JSON string inside a Django template, i.e.:

<script>
var assets = {{ json_string }};
</script>

I know that JSON parses \/ the same as /, so if I can just escape my closing HTML tags in the JSON string, I'll be good. But, I'm not sure of the best way to do this.

My naive approach would just be this:

json_string = '[{"asset_created": null, "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "<script></script>"}]'
escaped_json_string = json_string.replace('</', r'<\/')

Is there a better way? Or any gotchas that I'm overlooking?

Upvotes: 5

Views: 4767

Answers (1)

cwgem
cwgem

Reputation: 2809

Updated Answer

Okay I assumed a few things incorrectly. For escaping the JSON, the simplejson library has a method JSONEncoderForHTML than can be used. You may need to install it via pip or easy_install if the code doesn't work. Then you can do something like this:

import simplejson
asset_json=simplejson.loads(json_string)
encoded=simplejson.encoder.JSONEncoderForHTML().encode(assets_json)

which encoded will give you this:

'{"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "\\u003cscript\\u003e\\u003c/script\\u003e", "asset_created": null}'

This is a more overall solution than the slash replace as it handles other encoding caveats as well.

The loads part is a side-effect of having the JSON already encoded. This can be avoided by not using DJango if possible to generate the JSON and instead using simplejson:

simplejson.dumps(your_object_to_encode, cls=simplejson.encoder.JSONEncoderForHTML)

Old Answer

Try wrapping your script in CDATA:

<script>
//<![CDATA[
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
//]]>
</script>

It's meant to flag the parser on this sort of thing. Otherwise you'll need to use the character escapes that have been mentioned.

Upvotes: 6

Related Questions