Reputation: 288
Challenge : Our users have access to an "contentEditable" DIV in which a JS library inserts HTML in it. Here's how we thought the HTML should show up in the contentEditable :
<span class="stylish-blue-button">
<span style="display:none;">[data-user="12345" data-userId="678910"]</span>
John Smith
<span style="display:none;">[/]</span>
</span>
...Blablabla some other text...
We hand over this HTML to PHP, where we execute strip_tags(). This should give us :
[data-user="12345" data-userId="678910"]John Smith[/] ...Blablabla some other text...
Question : When rendering the text on the page, we were wondering if there was a secure/reliable way to have the above custom markdown converted to (before handing it to Handlebars.js) :
<span class="stylish-blue-button" data-user="12345" data-userId="678910">John Smith</span> ...Blablabla some other text...
Why : This assures us that the user generated content was handled safely, all while keeping the user generated markdown in the contentEditable "pretty" ("stylish-blue-button" class).
If you have any suggestions to make this whole process simpler, we're opened to changing our markdown's format.
Thank you so much!
Upvotes: 2
Views: 315
Reputation: 23892
You could use a regex like this:
$string = '<span class="stylish-blue-button">
<span style="display:none;">[data-user="12345" data-userId="678910"]</span>
John Smith
<span style="display:none;">[/]</span>
</span>
...Blablabla some other text...';
echo preg_replace('~\[(data-user="\d+")\h+(data-userId="\d+")\]\s*(.+?)\s*\[/\]\s*(.*)~s', '<span $1 $2>$3</span>$4', trim(strip_tags($string)));
Here's a regex101 demo explaining exactly what that regex is doing. If you have a particular questions please ask.
Output:
<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...
A few quick regex notes.
*
is a quantifier meaning zero or more of the preceding character.
+
is a quantifier meaning one or more (aka it is required) of the preceding character.
\s
is a whitespace character.
\h
is a horizontal space.
.
is any single character.
\d
is a single number (0-9).
()
are capturing groups they capture into $1
, $2
etc. in the order they were found.
Looking at that regex again a quick note: This \[/\]
is read as literal [/]
. The backslashes are escaping the []
which otherwise would create a character class (meaning only the /
character would be allowed there).
Multi-instances:
$string = '<span class="stylish-blue-button">
<span style="display:none;">[data-user="12345" data-userId="678910"]</span>
John Smith
<span style="display:none;">[/]</span>
</span>
...Blablabla some other text...
<span class="stylish-blue-button">
<span style="display:none;">[data-user="12345" data-userId="678910"]</span>
John Smith
<span style="display:none;">[/]</span>
</span>
...Blablabla some other text...
<span class="stylish-blue-button">
<span style="display:none;">[data-user="12345" data-userId="678910"]</span>
John Smith
<span style="display:none;">[/]</span>
</span>
...Blablabla some other text...';
echo preg_replace('~\s*\[(data-user="\d+")\h+(data-userId="\d+")\]\s*(.+?)\s*\[/\]\s*~s', '<span $1 $2>$3</span>', trim(strip_tags($string)));
Output:
<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...
For looser Ids just change the \d+
to [a-zA-Z0-9 ]+
.
So:
preg_replace('~\s*\[(data-user="\d+")\h+(data-userId="[a-zA-Z0-9 ]+")\]\s*(.+?)\s*\[/\]\s*~s'
Upvotes: 2