Jason
Jason

Reputation: 51

Leave existing HTML entities as-is, but convert double-quotes and single-quotes

I'm using PHP code to generate my meta description tag, like so:

<meta name="description" content="<?php
echo $this->utf->clean_string(word_limiter(strip_tags(trim($paperResult['file_content'])),27));
?>


Here's an example of the meta description output:

<meta name="description" content="blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah" />

The two HTML entities in that example meta description are a paragraph sign (&#182;) followed by an ellipsis (&#8230;). They are already in HTML entity form in the source text, so I want them to remain unchanged. The problem is that I also need the quotation marks within the description to convert to &quot; in order to prevent the meta tag from breaking. Every combination/configuration that I try either does not work or breaks my site because I'm getting the code wrong. For example, when I try the following code, the quotation marks convert to their HTML entity, as desired, but the paragraph symbol and ellipsis entities break because the ampersand character at the beginning of the existing HTML entities gets converted to &amp;. That leaves me with a broken &#182; (&amp;#182;) and a broken &#8230; (&amp;#8230;) :

 echo $this->utf->clean_string(word_limiter(htmlspecialchars(strip_tags(trim($paperResult['file_content']))),27));

I've been trying—literally, for days—to figure this out. I've searched extensively in Stack Overflow, to no avail. I just need the existing HTML entities to remain unchanged and quotation marks to be converted to their HTML entity (&quot;). I have studied the ENT_QUOTES option and I know that the solution probably exists therein, but I can't figure out how to incorporate it into my particular line of code. I'm hoping that you PHP gurus will have mercy on this tortured soul! I'd truly appreciate your help.

Thank you!

Upvotes: 1

Views: 4952

Answers (2)

ArtisticPhoenix
ArtisticPhoenix

Reputation: 21681

If it's the contents of the "content" attribute you can do this

$str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
echo htmlentities($str, ENT_QUOTES, "UTF-8", false);

Output

blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah

Sandbox

The key thing here is the 4th argument

string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

Specifically

double_encode When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

That way it doesn't double encode the ampersand.

htmlspecialchars also has a double encode argument.

htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

$str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
echo htmlspecialchars($str, ENT_QUOTES, "UTF-8", false);

Output

blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah

Sandbox

If it's the whole tag, then you'll have to pull out the contents and modify it and then replace it so as to preserve the < and >, but it's not clear in the question if that is the case.

PS there is not a whole lot of difference between htmlspecialchars and htmlentities, it mainly has to do with é accute and other accent things like that, htmlentities encodes those too, if I remember correctly.

UPDATE

I need the solution to be incorporated into my particular format of PHP code (i.e., a single line of PHP that maintains my existing functions/functionality), as miken32 brilliantly did above

To put it in your code,

<meta name="description" content="<?=htmlspecialchars(word_limiter(trim($paperResult['file_content']),27),ENT_QUOTES,"UTF-8",false);?>"/>

UPDATE2

With preg_replace('/[\r\n]+/', ' ', $string) removes \r\n or \n one or more times +. But it may be better to do it this way preg_replace(['/[\r\n]+/', '/\s+/'], ' ', $string). Which would remove run on spaces too.

 <meta name="description" content="<?=htmlspecialchars(word_limiter(preg_replace('/[\r\n]+/', ' ', trim($paperResult['file_content'])),27),ENT_QUOTES,"UTF-8",false);?>"/>

Basically what it amounts to is anything that makes the text shorter you probably want to do before word_limiter (whatever that is). And any thing that makes it longer, like changing " to &quote; you probably want to do after (maybe). It just seems more logical to me.

Cheers!

Upvotes: 3

miken32
miken32

Reputation: 42743

I can't be certain since you don't tell us what all those other functions do, but it seems like you could just do this:

<meta name="description" content="<?=htmlspecialchars(html_entity_decode(word_limiter($paperResult['file_content'], 27)))?>"/>

So limit your word count, turn any entities into characters, and then turn any special characters back into entities again. There's no need to be stripping tags and such for security, since htmlspecialchars will ensure any output is safe for inclusion in HTML.

Upvotes: 1

Related Questions