Reputation:
Working environment Python version:
Python 3.6.1
I've tried a number of methods outlined here on StackOverflow and other places around the internet - yet I still can't seem to get this working.
I could have any string...and the emojis may or may not be surrounded by whitespace, may be within " or after a hashtag etc etc...anyways, these circumstances are giving me some troubles.
This is what I have:
import sys
sys.maxunicode
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F"
u"\U0001F300-\U0001F5FF"
u"\U0001F680-\U0001F6FF"
u"\U0001F1E0-\U0001F1FF"
"]+", flags=re.UNICODE)
text = "" #This could be any text with or without emojis
text = emoji_pattern.sub(r'', text)
The above however when displayed or printed still have the emojis within the text.
text
is a unicode string i.e., type(text)
returns <type 'unicode'>
So what am I missing? I seem to have emojis remaining. I would also prefer a method that reflects that these Unicode designations could be expanded upon in the future so I would rather just have a method that keeps all regular characters.
Encoding the text as 'unicode_escape'
gives the following:
b'[1/2] Can you see yourself as Prompto or Aranea?\\nGet higher quality images from our FB page \\n\\u2b07\\ufe0f\\u2026'
The raw unformatted text is:
[1/2] Can you see yourself as Prompto or Aranea?
Get higher quality images from our FB page
⬇️…
Upvotes: 2
Views: 1113
Reputation: 178179
Not sure what you think sys.maxunicode
does, but your code works with Python 3.6. Are you sure you have all the emoji ranges covered?
import re
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F"
u"\U0001F300-\U0001F5FF"
u"\U0001F680-\U0001F6FF"
u"\U0001F1E0-\U0001F1FF"
"]+", flags=re.UNICODE)
text = 'Actual text with emoji: ->\U0001F620\U0001F310\U0001F690\U0001F1F0<-'
print(text)
text = emoji_pattern.sub(r'', text)
print(text)
Output:
Actual text with emoji: ->😠🌐🚐🇰<-
Actual text with emoji: -><-
Note that flags=re.UNICODE
is the default in Python 3.6, so it is not needed. Unicode strings are also the default, so u"xxxx"
can just be "xxxx"
.
Upvotes: 1