Reputation: 13600
For example:
Hello! :) It's a good day to-day :D 'Aight? <3
It would return:
One may consider all emoticons to be two characters long...also, if it helps, only 'forwards' emoticons would probably be encountered.
The case without emoticons is trivial, but with them -- as well as stripping out punctuation of other words -- is sort of tripping me up.
Is there an quick way besides .split and running a block to check each word logically?
Upvotes: 1
Views: 360
Reputation: 38456
The following regex should find any words (without punctuation other than a dash/single-quote/underscore), or a 2-character emoticon:
\s*(?:([a-zA-Z0-9\-\_\']+)|([\:\;\=\[\]\{\}\(\)\<3dDpP]{2}))\s*
Regex Explained:
\s* # any whitespace
(?:
([a-zA-Z0-9\-\_\']+) # any alpha-numeric character, dashes, underscores, single-quotes
|
([\:\;\=\[\]\{\}\(\)\<3dDpP]{2}) # any 2-punctuation marks commonly found in emoticons, including
# the number 3, for the <3 and D for :D
)
\s* # any whitespace
Upvotes: 1
Reputation: 810
It's not actually a regex, but does the job!
"Hello! :) It's a good day to-day :D 'Aight? <3".split
=> ["Hello!", ":)", "It's", "a", "good", "day", "to-day", ":D", "'Aight?", "<3"]
Upvotes: 0