Justin L.
Justin L.

Reputation: 13600

Regex (or other solution) to get all words in string, including emoticons, and stripped punctuations

For example:

Hello! :)  It's a good day to-day :D  'Aight? <3

It would return:

  1. Hello
  2. :)
  3. It's
  4. a
  5. good
  6. day
  7. to-day
  8. :D
  9. 'Aight
  10. <3

One may consider all emoticons to be two characters long...also, if it helps, only 'forwards' emoticons would probably be encountered.

The case without emoticons is trivial, but with them -- as well as stripping out punctuation of other words -- is sort of tripping me up.

Is there an quick way besides .split and running a block to check each word logically?

Upvotes: 1

Views: 360

Answers (2)

newfurniturey
newfurniturey

Reputation: 38456

The following regex should find any words (without punctuation other than a dash/single-quote/underscore), or a 2-character emoticon:

\s*(?:([a-zA-Z0-9\-\_\']+)|([\:\;\=\[\]\{\}\(\)\<3dDpP]{2}))\s*

Regex Explained:

\s*                             # any whitespace
(?:
    ([a-zA-Z0-9\-\_\']+)        # any alpha-numeric character, dashes, underscores, single-quotes
    |
    ([\:\;\=\[\]\{\}\(\)\<3dDpP]{2})    # any 2-punctuation marks commonly found in emoticons, including
                                # the number 3, for the <3 and D for :D
)
\s*                             # any whitespace

Upvotes: 1

Andr&#233; Medeiros
Andr&#233; Medeiros

Reputation: 810

It's not actually a regex, but does the job!

"Hello! :)  It's a good day to-day :D  'Aight? <3".split
=> ["Hello!", ":)", "It's", "a", "good", "day", "to-day", ":D", "'Aight?", "<3"]

Upvotes: 0

Related Questions