Reputation: 6060
I'm getting some data from the database and it has HTML Encoded chars (
). What options are there for removing these?
I don't want these rendered at all...I want them stripped from the data.
At the moment I'm not worried about the HTML tags...just the encoded characters.
EDIT: If it's relevant these chars are causing some errors in JSON validation.
Upvotes: 1
Views: 2363
Reputation: 305
Simply trimming by regexp should not be an option here. For example
can be coded as  
as well, but &\#[0-9]+;
regex would lead to data loss, since almost every char can become encoded like that at some point
(ex.:<p>HELLO</p>
).
Upvotes: 0
Reputation: 4033
If you want to get rid of them, obtain a list of such characters or a RegExp matching them all (something like &[a-z]+;
) and do a search-and replace.
However, if you only want them gone due to errors in JSON validation, you should correctly generated/encode your JSON to avoid the errors. (However, I don't really understand how they can cause invalid JSON.)
Upvotes: 1