Reputation: 183
I am trying to create xml in python using LXML . The value of a variable from a external data source is used to enter value in my xml file. if the value of variable contains non ASCII charector like € , that results in
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters.
Question : I want a method in python that would check if value in the variable contains non ASCII value , if yes , return corresponding unicode value for that variable , so that I can use the same for my xml? I am not looking for input_string = u'string €'. As I told, the variable gets its value from external data source. Please help
Upvotes: 1
Views: 307
Reputation: 886
It seems you're looking for this:
(assume Python 2.7 and input data of <type 'str'>
)
# function that converts input_string from 'str' to 'unicode'
# only if input_string contains non-ASCII bytes
def decode_if_no_ascii(input_string):
try:
input_string.decode('ascii')
except UnicodeDecodeError:
input_string = input_string.decode('utf-8') # 'utf-8' should match the encoding of input_string,
# it could be 'latin_1' or 'cp1252' in a particular case
return input_string
Let's test the function:
# 1. ASCII str
input_string = 'string'
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'str'> 'string' string
# ==> still 'str', no changes
# 2. non-ASCII str
input_string = 'string €'
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'unicode'> u'string \u20ac' string €
# ==> converted to 'unicode'
Is this what are you looking for?
Upvotes: 1