Jared
Jared

Reputation: 6060

Remove HTML Encoded characters

I'm getting some data from the database and it has HTML Encoded chars ( ). What options are there for removing these?

I don't want these rendered at all...I want them stripped from the data.

At the moment I'm not worried about the HTML tags...just the encoded characters.

EDIT: If it's relevant these chars are causing some errors in JSON validation.

Upvotes: 1

Views: 2363

Answers (2)

yan
yan

Reputation: 305

Simply trimming by regexp should not be an option here. For example &nbsp; can be coded as &#160; as well, but &\#[0-9]+; regex would lead to data loss, since almost every char can become encoded like that at some point (ex.:<p>&#72;&#69;&#76;&#76;&#79;</p>).

Upvotes: 0

Jan Schejbal
Jan Schejbal

Reputation: 4033

If you want to get rid of them, obtain a list of such characters or a RegExp matching them all (something like &[a-z]+;) and do a search-and replace.

However, if you only want them gone due to errors in JSON validation, you should correctly generated/encode your JSON to avoid the errors. (However, I don't really understand how they can cause invalid JSON.)

Upvotes: 1

Related Questions