Reputation: 604
I am using TinyMCE and it is converting all my attribute single quotes to double quotes on cleanup.
This is what I am putting into the editor.
<tr _excel-dimensions='{"row":{"rowHeight":50}}'>
<td _excel-styles='{"font":{"size":20,"color":{"rgb":"333333"},"bold":true},"fill":{"fillType":"solid","startColor":"F0F0F0"},"alignment":{"horizontal":"center"}}' colspan='6'>Affiliate Accounts</td>
</tr>
and this is what the editor does after saving it
<tr _excel-dimensions="{"row":{"rowHeight":50}}">
<td _excel-styles="{"font":{"size":20,"color":{"rgb":"333333"},"bold":true},"fill":{"fillType":"solid","startColor":"F0F0F0"},"alignment":{"horizontal":"center"}}" colspan="6">Accounts</td>
</tr>
There doesn't seem to be a way to override the setting in TinyMCE. So I am turning to RegEx with PHP when saving the data to the database. This is what I have so far, but doesn't seem to be capturing all the double quotes.
$content = preg_replace_callback('/<(.*)(\")(.*)(\")(.*)>/miU', function($matches) {
return "<" . $matches[1] . "'" . html_entity_decode($matches[3]) . "'" . $matches[5] . ">";
}, $content);
It is replacing the json encoded string, but not the colspan="6"
Thanks in advance for the help.
Upvotes: 2
Views: 94
Reputation: 4329
As I said in the comment, it's not very good to parse HTML with regex, better to use special libraries like PHP Simple HTML DOM Parser. However it's possible to construct a regex which will work on a correct HTML.
Our goal is to find all double-quoted strings inside a tag. First let's forget about requirement that the double-quoted string must be inside a tag. Then we can use this:
$content = preg_replace_callback('/"(.*?)"/',
function($matches) {
return "'" . html_entity_decode($matches[1]) . "'"
},
$content);
Now we need to add the check that the double-quoted string is inside a tag. To do this we construct a lookahead expression which checks the text between our double-quoted string and the end of the text:
>
there. It means that there must be some sequence of non-<
, non->
characters followed by >
. The corresponding regex is [^<>]*>
<
and >
. The regex for a group of characters containing a single tag is [^<]*<[^>]*>
. We need to repeat this group any number of times: (?:[^<]*<[^>]*>)*
<
, non->
characters till the end of the text: [^<>]*$
The resulting lookahead expression looks a bit terrifying, but does the work:
(?=[^<>]*>(?:[^<]*<[^>]*>)*[^<>]*$)
.
Finally, we incorporate this lookahead check into the original regex:
$content = preg_replace_callback('/"(?=[^<>]*>(?:[^<]*<[^>]*>)*[^<>]*$)(.*?)"/',
function($matches) {
return "'" . html_entity_decode($matches[1]) . "'"
},
$content);
You can check it here: Regex101 demo
Upvotes: 2