Clock Slave
Clock Slave

Reputation: 7967

Remove \u from string?

I have a few words in a list that are of the type '\uword'. I want to replace the '\u' with an empty string. I looked around on SO but nothing has worked for me so far. I tried converting to a raw string using "%r"%word but that didn't work. I also tried using word.encode('unicode-escape') but haven't gotten anywhere. Any ideas?

EDIT

Adding code

word = '\u2019'
word.encode('unicode-escape')
print(word) # error

word = '\u2019'
word = "%r"%word
print(word) # error

Upvotes: 2

Views: 8436

Answers (4)

vijay athithya
vijay athithya

Reputation: 1529

Given that you are dealing with strings only. We can simply convert it to string using the string function.

>>> string = u"your string"
>>> string
u'your string'
>>> str(string)
'your string'

Guess this will do!

Upvotes: 2

Clock Slave
Clock Slave

Reputation: 7967

I was making an error in assuming that the .encode method of strings modifies the string inplace similar to the .sort() method of a list. But according to the documentation

The opposite method of bytes.decode() is str.encode(), which returns a bytes representation of the Unicode string, encoded in the requested encoding.

def remove_u(word):
    word_u = (word.encode('unicode-escape')).decode("utf-8", "strict")
    if r'\u' in word_u: 
        # print(True)
        return word_u.split('\\u')[1]
    return word

vocabulary_ = [remove_u(each_word) for each_word in vocabulary_]

Upvotes: 3

logi-kal
logi-kal

Reputation: 7880

If I have correctly understood, you don't have to use regular expressions. Just try:

>>> # string = '\u2019'
>>> char = string.decode('unicode-escape')
>>> print format(ord(char), 'x')
2019

Upvotes: 1

Nerade
Nerade

Reputation: 115

Because you are facing problems with encodings and unicode it would be helpful to know the version of python you are using. I don't know if I get you right but this should do the trick:

string = r'\uword'
string.replace(r'\u','')

Upvotes: -2

Related Questions