Bobbby
Bobbby

Reputation: 49

Python .replace() function, removing backslash in certain way

I have a huge string which contains emotions like "\u201d", AS WELL AS "\advance\"

all that I need is to remove back slashed so that:

- \u201d = \u201d
- \united\ = united

(as it breaks the process of uploading it to BigQuery database)

I know it should be somehow this way:

string.replace('\','') But not sure how to keep \u201d emotions.

ADDITIONAL: Example of Unicode emotions

Upvotes: 3

Views: 226

Answers (4)

BlueSheepToken
BlueSheepToken

Reputation: 6099

You can split on all '\' and then use a regex to replace your emotions with adding leading '\'

s = '\\advance\\\\united\\ud83d\\udc9e\\u201c\\u2744\\ufe0f\\u2744\\ufe0f\\u2744\\ufe0f'
import re
print(re.sub('(u[a-f0-9]{4})',lambda m: '\\'+m.group(0),''.join(s.split('\\'))))

As your emotions are 'u' and 4 hexa numbers, 'u[a-f0-9]{4}' will match them all, and you just have to add leading backslashes

First of all, you delete every '\' in the string with either ''.join(s.split('\\')) or s.replace('\\')

And then we match every "emotion" with the regex u[a-f0-9]{4} (Which is u with 4 hex letters behind)

And with the regex sub, you replace every match with a leading \\

Upvotes: 1

alec_djinn
alec_djinn

Reputation: 10779

You could simply add the backslash in front of your string after replacement if your string starts with \u and have at least one digit.

import re

def clean(s):

    re1='(\\\\)' # Any Single Character "\"
    re2='(u)'    # Any Single Character "u"
    re3='.*?'    # Non-greedy match on filler
    re4='(\\d)'  # Any Single Digit

    rg = re.compile(re1+re2+re3+re4,re.IGNORECASE|re.DOTALL)
    m = rg.search(s)

    if m:
        r = '\\'+s.replace('\\','')
    else:
        r = s.replace('\\','')
    return r


a = '\\u123'
b = '\\united\\'
c = '\\ud83d'

>>> print(a, b, c)
\u123 \united\ \ud83d

>>> print(clean(a), clean(b), clean(c))
\u123 united \ud83d

Of course, you have to split your sting if multiple entries are in the same line:

string = '\\u123 \\united\\ \\ud83d'
clean_string = ' '.join([clean(word) for word in string.split()])

Upvotes: 1

MikkelDalby
MikkelDalby

Reputation: 152

You can do it as simple as this

text = text.replace(text[-1],'')

Here you just replace the last character with nothing

Upvotes: 0

DeshDeep Singh
DeshDeep Singh

Reputation: 1843

You can use this simple method to replace the last occurence of your character backslash: Check the code and use this method.

def replace_character(s, old, new):
    return (s[::-1].replace(old[::-1],new[::-1], 1))[::-1]

replace_character('\advance\', '\','')
replace_character('\u201d', '\','')

Ooutput:

\advance \u201d

Upvotes: 0

Related Questions