lost
lost

Reputation: 377

Python XML Parser Loading Numbers as strings

I have an xml file with about 15000 values in it. I read it using

import xml.etree.ElementTree as ET

An example of how I pulled out a value is

def get_year1_quantity(element): return element[5][0].text I got all the data and put the different values into list, then put them into a pandas dataframe.

I tried to filter the data by:

filtered=this[this.Year1_Q > this.Year2_Q]

Everything went well in till I looked at the data in a csv file. Only a few of the values are loaded as strings. They go into excel as floats or int depending on the values. I tried

Year1_Q=[map(int, x) for x in Year1_Q] Year1_P=[map(float, x) for x in Year1_P] Year2_Q=[map(int, x) for x in Year2_Q] Year2_P=[map(float, x) for x in Year2_P]

but it gave me an error, Year1_Q=[map(int, x) for x in Year1_Q]

ValueError: could not convert string to int:

It does not make sense to me that some numbers are loaded as numbers, while others are loaded as strings. They are identical in the xml file.

Has anyone had this problem before?

I tried this[columns]=this[columns].astype(float) and got invalid literal for float(): 6,625.00

Upvotes: 1

Views: 1478

Answers (1)

pepr
pepr

Reputation: 20794

XML is a text file and only you know whether it should be interpreted as string or as a number. When you need it as a number in Python, you have to explicitly convert it (as you do with map). The CSV is also a text format.

You can use

try:
    Year1_Q=[map(int, x) for x in Year1_Q]
except ValueError:
    print(repr(Year1_Q))

to make the problematic value visible.

The CSV is not a native Excel format. Excel may be the tool to be blamed. The result may differ when you double click on the CSV file or when you start Excel and open the CSV file via menu (this is real observation at least for some older versions of Excel).

Upvotes: 1

Related Questions