Regex (or other solution) to get all words in string, including emoticons, and stripped punctuations

Question

For example:

Hello! :)  It's a good day to-day :D  'Aight? <3

It would return:

Hello
:)
It's
a
good
day
to-day
:D
'Aight
<3

One may consider all emoticons to be two characters long...also, if it helps, only 'forwards' emoticons would probably be encountered.

The case without emoticons is trivial, but with them -- as well as stripping out punctuation of other words -- is sort of tripping me up.

Is there an quick way besides .split and running a block to check each word logically?

newfurniturey · Accepted Answer

The following regex should find any words (without punctuation other than a dash/single-quote/underscore), or a 2-character emoticon:

\s*(?:([a-zA-Z0-9\-\_\']+)|([\:\;\=\{\}\<3dDpP]{2}))\s*

Regex Explained:

\s*                             # any whitespace
(?:
    ([a-zA-Z0-9\-\_\']+)        # any alpha-numeric character, dashes, underscores, single-quotes
    |
    ([\:\;\=\{\}\<3dDpP]{2})    # any 2-punctuation marks commonly found in emoticons, including
                                # the number 3, for the <3 and D for :D
)
\s*                             # any whitespace

Regex (or other solution) to get all words in string, including emoticons, and stripped punctuations

Answers (2)

Related Questions