Reputation: 6024
I am analysing informal chat style message for sentiment and other information. I need all of the emoticons to be replaced with their actual meaning, to make it easier for the system to parse the message.
At the moment I have the following code:
$str = "Am I :) or :( today?";
$emoticons = array(
':)' => 'happy',
':]' => 'happy',
':(' => 'sad',
':[' => 'sad',
);
$str = str_replace(array_keys($emoticons), array_values($emoticons), $str);
This does a direct string replacement, and therefore does not take into account if the emoticon is surrounded by other characters.
How can I use regex and preg_replace
to determine if it is actually an emoticon and not part of a string?
Also how can I extend my array so that happy
element for example can contain both entries; :)
and :]
?
Upvotes: 1
Views: 542
Reputation: 59699
For maintainability and readability, I would change your emoticons array to:
$emoticons = array(
'happy' => array( ':)', ':]'),
'sad' => array( ':(', ':[')
);
Then, you can form a look-up table just like you originally had, like this:
$emoticon_lookup = array();
foreach( $emoticons as $name => $values) {
foreach( $values as $emoticon) {
$emoticon_lookup[ $emoticon ] = $name;
}
}
Now, you can dynamically form a regex from the emoticon lookup array. Note that this regex requires a non-word-boundary surrounding the emoticon, change it to what you need.
$escaped_emoticons = array_map( 'preg_quote', array_keys( $emoticon_lookup), array_fill( 0, count( $emoticon_lookup), '/'));
$regex = '/\B(' . implode( '|', $escaped_emoticons) . ')\B/';
And then use preg_replace_callback()
with a custom callback to implement the replacement:
$str = preg_replace_callback( $regex, function( $match) use( $emoticon_lookup) {
return $emoticon_lookup[ $match[1] ];
}, $str);
You can see from this demo that this outputs:
Am I happy or sad today?
Upvotes: 2