user3714751
user3714751

Reputation: 336

PHP preg_replace Confusing error

i have a really strange problem where i spent many hours and without any success... . I have a contenteditable area on my website where users can select emoticons which one they can see instantly in their written text (in case of the contenteditable area). So for messages from user to user i do not care about the length of the text but for writing comments i do! I need to count all characters of the string.

Now i have the problem that emoticons are transmitted like that:

<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon emoticon-class-name-for-example-happy">

Okay for sure i want to count only 1 character for each emoticon so i wrote a regex and tried to replace all emoticons with a '1'. Afterwards i thought it is pretty easy with just strlen i get the number of used characters. But this works only in theory, but damn why... .

So my regex is:

<img[ ]src=["'].+?["'][ ]class=["']emoticon[ ].+?["'][>]

the next point was that i started to test my regex with the help of phpliveregex.com . The result you can see here. Just click on the preg_replace tab.

Now i was pretty sure that this has to work for me and i tried it. I wrote a function in PHP:

private function countCharactersOfSpecialUserInput($userInput) {
    $wholeCharacters = 0;
    $input_lines = 'This is a test
                    for<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">my
                    <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">regex 
                    which<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">should
                    be alright <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking">and<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking">
                    match all this emoticons except things like <img dsopjfdojp
                    <img oew> because this ones are not real emoticons! The following is a real one: <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">
                    ';      
    return preg_replace("/<img[ ]src=[\"'].+?[\"'][ ]class=[\"']emoticon[ ].+?[\"'][>]/", "1", $input_lines);
}

In my function i does not count the characters right now because there is a bug, which i do not understand. It will sound impossible but it is real :-(.

If i use the string which is safed in the variable $input_lines it works well. But if i use the text which a user can transmit it does not work!

I used var_dump as well as print_r to get the transmitted data from the user. Afterwards i used exactly this string and saved it in the input_lines variable. And the unbelievable fact is by using the input_lines variable it works again... . Doesn't matter what i do my code does not replace a single emoticon while the text was transmitted dynamically by the user... .

Is there anything where you could imagine what could case this problem? I am clueless and i can not believe that this is real. It has to work i tried so many other things about that but nothing worked for me... .

Upvotes: 0

Views: 223

Answers (3)

hek2mgl
hek2mgl

Reputation: 157967

The text with the images is actually a HTML snippet, therefore I would use DOM to parse it:

$input_lines = 'This is a test for<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">my <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">regex which<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">should be alright <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking">and<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking"> match all this emoticons except things like <img dsopjfdojp <img oew> because this ones are not real emoticons! The following is a real one: <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">';

$doc = new DOMDocument();

// Suppress warnings
@$doc->loadHTML($input_lines);

$imgs = $doc->getElementsByTagName("img");
$number_of_imgs = $imgs->length;
echo "Found $number_of_imgs images" . PHP_EOL;

// The plain text is actually the nodeValue of
// the whole snippet.
$text = $imgs->item(0)->parentNode->nodeValue;
$len = mb_strlen($text);

echo "Text length: $len + $number_of_imgs(images)" . PHP_EOL;

See it working: http://3v4l.org/MH5T6

Upvotes: 1

Byte Lab
Byte Lab

Reputation: 1626

Why are you using var_dump and print_r to get data from the user? Those functions echo inputs to standard out, they don't actually return strings. Take a look:

php > $num_finds = preg_replace("/<img[ ]src=[\"'].+?[\"'][ ]class=[\"']emoticon[ ].+?[\"'][>]/", "1", $lines);
php > echo($num_finds);
1my1regex which1should be alright 1and1 match all this emoticons except things like <img dsopjfdojp <img oew> because this ones are not real emoticons! The following is a real one: 1

works fine. If, however, you try to use var_dump, you get this:

php > $dump_num_finds = preg_replace("/<img[ ]src=[\"'].+?[\"'][ ]class=[\"']emoticon[ ].+?[\"'][>]/", "1", var_dump($lines));
string(718) "<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">my<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">regex which<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">should be alright <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking">and<img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Not-Talking"> match all this emoticons except things like <img dsopjfdojp <img oew> because this ones are not real emoticons! The following is a real one: <img src="data:image/gif;base64,R0lGODlhAQABAAAAACwAAAAAAQABAAA=" class="emoticon Girl">"
php > echo $dump_num_finds;

Again, the reason is that var_dump doesn't return anything. Unless you're using something like ob_start() with ob_get_clean() to get the string echo'd to standard out (which imo is a poor solution and won't work), your approach will not work. You can also pass true as the second parameter to print_r for it to return output, but I'm having trouble seeing why you'd be using either of these functions in the first place.

P.S. As a side note, in my opinion, your regex is a bit sloppy. You should use \s to signify a whitespace character instead of [ ]. You could also just use without the brackets and it would do the same thing. Also, you don't need the brackets around the last >:

<img\ssrc=["'].+?["']\sclass=["']emoticon\s.+?["']>

Upvotes: 0

Tarquin
Tarquin

Reputation: 492

It would be prudent for you to store emoticons in the database as text. For example a happy face can be stored as :) or =) and only use up 2 characters in your database.

Then on output do the OPPOSITE of what you are doing here and use preg_replace to replace all instances of :) or =) etc.. with the relevant <img src=...

This is almost the standard used in all web applications. It will allow you to dynamically change what emoticons you are using later, for example if you change your template and want the emoticons to change as well, you change your emoticon function and all previous occurances in the database will also change.

This would not only assist you with the counting of characters but future management and cleanliness of your database.

<?php
    $input = 'Hello There! :) How are you today?';
    $happy = '<img src="img/smile.gif" border="0" />';

    $output = preg_replace("(\:\))", $happy, $input);

    echo $output;
?>

View In Action

Obviously you could go so far as to adapt this into using a database to manage your smilies and using an array to run pregreplace. The sky becomes the limit.

Upvotes: 0

Related Questions