Reputation: 9
i have a problem with entites when emoji is in text.
it is my text :
βπ·πΆπ²π
±οΈπ¬βπ·πππ
abcdefghijklmnop
@aaabbbbbcccc
and it is my entities event:
entities=[ MessageEntityMention( length=13, offset=49 ), ]
and my code :
txt = event.raw_text
print(event.message.message)
if event.message.entities != None:
i=0
c = len(event.message.entities)
while i<c:
a = event.message.entities[i]
if (type(a) is MessageEntityMention) == True:
print(a)
o = a.offset
l = a.length
eo = o + l
txt = txt.replace(event.raw_text[o:eo],"@example")
i=i+1
print(txt)
This should change the ID(@aaabbbbbcccc) to @example, but it does not, and return:
βπ·πΆπ²π
±οΈπ¬βπ·πππ
abcdefghijklmnop
@aaabbbbb@example
The problem is due to the emojis. It works fine when I delete the emojis.
what should I do?
Upvotes: 0
Views: 878
Reputation: 7141
The offsets are calculated in text with surrogates, so you need to add them before doing operations with helpers.add_surrogate
:
from telethon import helpers
text = helpers.add_surrogate(message.raw_text)
... # work with `text` and `message.entities` (offsets will be OK now)
text = helpers.del_surrogate(text) # remove the surrogate pairs when done
The method showcased by TheKill in their answer is a better fit depending on your situation, but this is how that works underneath in case you need it.
Upvotes: 0
Reputation: 1069
async def handler(event):
content = event.raw_text
for ent, txt in event.get_entities_text():
# ent : shows you the MessageEntity constructor
# txt : shows you the text interested
if isinstance(ent, types.MessageEntityMention): # check if it's a mention
content = content.replace(txt, '@example')
Check out the telethon docs for more informations about get_entities_text
.
Upvotes: 0