Reputation: 836
I have a textarea on my page where i allow users to write text. To preserve linebreaks that users make while inputting I use:
editBox.val().replace(/\r?\n/g, "\r\n");
Before the data is uploaded to the database I'm using:
$data = mysql_real_escape_string($data);
I do this to preserve as much data as possible without stripping data that might be useful later on. This also helps me to preserve possible formatting options that can be allowed later. I read that this is good practice.
The problem is:
When the data is pulled from the database i need to clean it. For that I'm using:
function cleanData($data)
{
$data = nl2br($data);
$data = strip_tags($data,"<br><b><p><i><h1><h2><h3><h4><h5><h6>");
return $data;
}
I'm allowing certain tags that will later be used with a little self-made wysiwyg editor. However this allows for users to input the following:
<p title="some junk here">hax</p>
While the title attribute isn't incredibly annoying, other attributes may. I'm not sure whether a user can add class and id attributes, but i can't see why they shouldn't be able to. It also removes anything that looks like a tag such as smilies: "*<:o) <- happy clown"
would end up looking something like this: "*"
I tried using:
$data = filter_var($data, FILTER_SANITIZE_SPECIAL_CHARS);
Instead of the cleanData function, however this encodes everything showing my <br>
converted line breaks as text instead of adding line breaks.
In short my problem is:
I can't seem to find a nice way to clean the data so linebreaks/br are preserved while also retaining the possibillity to add some sort of wysiwyg formatting. I don't really care whether it's html tags or something like bbCode: [b]
.
My question is as follows:
Is there a smarter way to do this or is my method fine with a few tweaks?
What would you guys do? :) I would like to avoid using external libraries unless there's a very strong inscentive.
PS: I've searched around lot and found no satisfactory answer - I've also spent a long time making this post readable and understandable. I hope I've done it right.
Upvotes: 3
Views: 2957
Reputation: 19466
First of all,
editBox.val().replace(/\r?\n/g, "\r\n");
That should not be done on client side (JavaScript), but instead on server side (PHP) if you want to be certain that it happens. It can be circumvented by disabling JavaScript or posting from from another site.
Regarding the actual question, I'd go with some pre-made markup language, such as Textile, which is also what's being used here on StackOverflow.
But if you wish to allow for a some custom formatting, you could (as you suggest yourself) use BB-code ([b]
, [i]
, etc.). The way I'd implement this is to first replace all HTML special characters with their respective HTML entities using htmlspecialchars
. After this, you could replace things such as [b]
with <strong>
, etc.
Example
$str = "See, [b]evil[/b] input<br/>, <i>etc</i>.";
$str = htmlspecialchars($str);
print $str; // "See, [b]evil[/b] input <br/>, <i>etc</i>."
$str = str_replace(array("[b]","[/b]"),array("<b>","</b>",$str);
print $str; // "See, <b>evil</b> input <br/>, <i>etc</i>."
To avoid bad markup, you should probably use some regular expressions to replace the BBcode with HTML tags.
Upvotes: 1