jay
jay

Reputation: 1463

Convert unicode to xml string

I have a unicode string, defined as

data= u'<project> <DataRuleDefinitions> <DataRuleDefinition name="dummy_def">
 <ExecutableRule name="rule_dummy_def" excludeDuplicates="false" 
  folder="14_A_Loan_Loss_Projections;All"><Bindings> <Binding var="field"><Columnname="CLL_IA_SNAPSHOT.prod_files.prod_files."CLL_201706.csv".AccrualStatusChangeDate"/> </Binding> </Bindings> </ExecutableRule> </DataRuleDefinition> 
 </DataRuleDefinitions> </project>'

I did try to convert it to xml using the following idea:

import xml.etree.ElementTree as ET
 tree = ET.fromstring(data)

But I am encountering the following error:
ParseError: not well-formed (invalid token): line 1, column 326

Can I kindly get help to rectify this error and convert this unicode to xml?

Upvotes: 0

Views: 953

Answers (1)

Vap0r
Vap0r

Reputation: 2616

There are two major issues and neither appears to be an issue with unicode to XML conversion. The issue as mentioned by your error is poorly formed XML, causing a ParseError.

The problem is with the following line of XML:

<Columnname="CLL_IA_SNAPSHOT.prod_files.prod_files."CLL_201706.csv".AccrualStatusChangeDate"/>

Element names are not allowed to have an equal sign in them, assign this data to the val attribute (the attribute name can be anything):

<Columnname val="CLL_IA_SNAPSHOT.prod_files.prod_files."CLL_201706.csv".AccrualStatusChangeDate"/>

There is still an issue. Quote around val, but also quotes around val's value. We can remove these, escape them, or use a different quote encapsulation:

<Columnname val="CLL_IA_SNAPSHOT.prod_files.prod_files.'CLL_201706.csv'.AccrualStatusChangeDate"/>

Now you should have no issues. If you would like to check that your XML is good, you can use an online XML pretty printer, of which there are many. They should tell you if you have poorly formed XML. Good luck!

Upvotes: 1

Related Questions