a14stoner
a14stoner

Reputation: 87

how to escape texts for formatting in python

I have the following text.

"\*hello* * . [ }"

It should be escaped like this:

"\*hello\\* \* \\. \\[ \\}"

How to do this with python regex?

Every special character (the special characters are: _, *, [, ], (, ), ~, `, >, #, +, -, =, |, {, }, ., ! must be escaped with the preceding character \.

I tried it with this but then every character is escaped:

escape_chars = r'_*[]()~`>#+-=|{}.!'
return re.sub(f'([{re.escape(escape_chars)}])', r'\\\1', text)

Then the text is unformatted like this:

\*hello\* \* \. \[ \}

But it should be like this:

**hello** \* \. \[ \}

Some examples:

At \* \* \* only the middle one should be escaped At \{ \{ \} only the middle one should be escaped

I need this for tex formatting: https://core.telegram.org/bots/api#markdownv2-style

Upvotes: -2

Views: 5864

Answers (2)

roskakori
roskakori

Reputation: 3346

I wanted something without adding complex external dependencies, so here's a function that escapes all characters according to "Characters You Can Escape":

_MARKDOWN_CHARACTERS_TO_ESCAPE = set(r"\`*_{}[]<>()#+-.!|")

def escaped_markdown(text: str) -> str:
    return "".join(
        f"\\{character}" if character in _MARKDOWN_CHARACTERS_TO_ESCAPE else character 
        for character in text
    )

Example:

assert escaped_markdown("**[text](https://example.com)**") == (
    r"\*\*\[text\]\(https://example\.com\)\*\*"
)

This requires Python 3.6 or later.

Upvotes: 0

CallMeStag
CallMeStag

Reputation: 7060

Since you tagged python-telegram-bot, I'm gonna point you to the escape_markdown helper function. the source code for this is here

Maybe this helps you. However, I have to agree with Chris: It's not clear to me what you actually want to achieve.

EDIT:

The use case seems to be that users should be allowed to set some kinds of template messages, which can have dynamic input. OP did not (yet) explain how exactly those templates look like, so I'll just make up an example. Let's say the user wants to specify a welcome message of the format

Hello_there, {username}!

Where Hello_there is italic and {username} is replaced with the corresponding string at runtime and should be displayed bold, including the !.

I see two ways to approach this.

  1. The user sends the message as formatted text (i.e. the Bot receives a message "Hellow_there, {username}!"). In this case, one can store the template by simply storing update.effective_message.text_markdown(_v2)/text_html. See Message.text_html. Then at runtime, all you need to to is send_message(template.format(username=escaped_username), parse_mode=...). Note that here escaped_username is a string containing the username with special characters escaped. This can be achieved with either escape_markdown for markdown formatting or for HTML formatting with html.escape from the std lib

  2. The user sends the text with markup characters. Sticking to Markdown formatting for the example, the bot would receive a message saying _Hello_there_, *{username}!*. Now to convert this to a template, you'd have to somehow escape the relevant characters. In this case this should be _Hello\_there_,*escaped_username\!* at runtime. In this scenario I don't see a safe way to decide what to escape and what not to. While you can do some regexing to e.g. convert *{username}!* to *{username}\!*, how would you know if the user wants "Hello there_" or "Hello_there"?

I therefore highly recommend the first approach.


Disclaimer: I'm currently the maintainer of python-telegram-bot

Upvotes: 6

Related Questions