Parse xml file in pandas

Question

I have this xml file (it's called "LogReg.xml" and it contains some information about a logistic regression (I am interested in the name of the features and their coefficient - I'll explain in more detail below):



    
        
        2022-02-15T09:44:54Z
    
    
        PMMLPipeline(steps=[('classifier', LogisticRegression())])

I have parsed it using this code:

from lxml import objectify

path = 'LogReg.xml'

parsed = objectify.parse(open(path))
root = parsed.getroot()

data = []

if True:
    for elt in root.RegressionModel.RegressionTable:
        el_data = {}
        for child in elt.getchildren():
            el_data[child.tag] = child.text
        data.append(el_data)

perf = pd.DataFrame(data)

I am interested in parsing this bit:

so that I can build the following dictionary:

myDict = {
"const : 0.8013433785974717,
"grade" : 0.9010481046582982,
"emp_length" : 0.9460686056314133,
"dti" : 0.5117062988491518,
"Orig_FicoScore" : 0.07944303372859234,
"inq_last_6mths" : 0.20516234445402765,
"acc_open_past_24mths" : 0.4852503249658917,
"mort_acc" : 0.6673203078463711,
"mths_since_recent_bc" : 0.1962158305958366,
"num_rev_tl_bal_gt_0" : 0.12964661294856686,
"percent_bc_gt_75" : 0.04534570018290847
}

Basically, in the dictionary the Key is the name of the feature and the value is the coefficient of the logistic regression.

Please can anyone help me with the code?

Jack Fleeting · Accepted Answer

I'm not sure you need pandas for this, but you do need to handle the namespaces in your xml.

Try something along these lines:

myDict = {}
#register the namespace
ns = {'xx': 'http://www.dmg.org/PMML-4_4'}

#you could collapse the next two into one line, but I believe it's clearer this way
rt = root.xpath('//xx:RegressionTable[.//xx:NumericPredictor]',namespaces=ns)[0]
nps = rt.xpath('./xx:NumericPredictor',namespaces=ns)

for np in nps:
    myDict[np.attrib['name']]=np.attrib['coefficient']
myDict

The output should be your expected output.

Parse xml file in pandas

Answers (1)

Related Questions