Reputation: 80346
I have a XML file with thousands of lines like:
<Word x1="206" y1="120" x2="214" y2="144" font="Times-Roman" style="font-size:22pt">WORD</Word>
I want to convert it (all it's attributes) to pandas
dataframe
. To do that i could loop through the file using beautiful soup and insert the values row by row or create lists to be inserted as columns. However I would like to know if there's a more pythonic way of accomplishing what I described. Thank you in advance.
Code example:
x1list=[]
x2list=[]
for word in soup.page.findAll('word'):
x1list.append(int(word['x1']))
x2list.append(int(word['x2']))
df=DataFrame({'x1':x1list,'x2':x2list})
Upvotes: 5
Views: 7959
Reputation: 212835
Try this:
DataFrame.from_records([(int(word['x1']), int(word['x2']))
for word in soup.page.findAll('word')],
columns=('x1', 'x2'))
Upvotes: 3