Dani
Dani

Reputation: 183

Python LXML- Method to check if the variable value has non ASCII value, if yes return unicode value

I am trying to create xml in python using LXML . The value of a variable from a external data source is used to enter value in my xml file. if the value of variable contains non ASCII charector like € , that results in

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters.

Question : I want a method in python that would check if value in the variable contains non ASCII value , if yes , return corresponding unicode value for that variable , so that I can use the same for my xml? I am not looking for input_string = u'string €'. As I told, the variable gets its value from external data source. Please help

Upvotes: 1

Views: 307

Answers (1)

MaximTitarenko
MaximTitarenko

Reputation: 886

It seems you're looking for this:
(assume Python 2.7 and input data of <type 'str'>)

# function that converts input_string from 'str' to 'unicode'
# only if input_string contains non-ASCII bytes 

def decode_if_no_ascii(input_string):

    try:
        input_string.decode('ascii')
    except UnicodeDecodeError:
        input_string = input_string.decode('utf-8') # 'utf-8' should match the encoding of input_string,
                                                    # it could be 'latin_1' or 'cp1252' in a particular case            
    return input_string

Let's test the function:

# 1. ASCII str
input_string = 'string' 
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'str'> 'string' string  
# ==> still 'str', no changes 

# 2. non-ASCII str
input_string = 'string €'
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'unicode'> u'string \u20ac' string € 
# ==> converted to 'unicode'

Is this what are you looking for?

Upvotes: 1

Related Questions