A S
A S

Reputation: 1235

Extracting similar XML attributes with BeautifulSoup

Let's assume I have the following XML:

<time from="2017-07-29T08:00:00" to="2017-07-29T09:00:00">
    <!-- Valid from 2017-07-29T08:00:00 to 2017-07-29T09:00:00 -->
    <symbol number="4" numberEx="4" name="Cloudy" var="04"/>
    <precipitation value="0"/>
    <!-- Valid at 2017-07-29T08:00:00 -->
    <windDirection deg="300.9" code="WNW" name="West-northwest"/>
    <windSpeed mps="1.3" name="Light air"/>
    <temperature unit="celsius" value="15"/>
    <pressure unit="hPa" value="1002.4"/>
</time>
<time from="2017-07-29T09:00:00" to="2017-07-29T10:00:00">
    <!-- Valid from 2017-07-29T09:00:00 to 2017-07-29T10:00:00 -->
    <symbol number="4" numberEx="4" name="Partly cloudy" var="04"/>
    <precipitation value="0"/>
    <!-- Valid at 2017-07-29T09:00:00 -->
    <windDirection deg="293.2" code="WNW" name="West-northwest"/>
    <windSpeed mps="0.8" name="Light air"/>
    <temperature unit="celsius" value="17"/>
    <pressure unit="hPa" value="1002.6"/>
</time>

And I want to collect time from, symbol name and temperature value from it, and then print it out in the following manner: time from: symbol name, temperaure value -- like this: 2017-07-29, 08:00:00: Cloudy, 15°.

(And there are a few name and value attributes in this XML, as you see.)

As of now, my approach was quite straightforward:

#!/usr/bin/env python
# coding: utf-8

import re
from BeautifulSoup import BeautifulSoup

# data is set to the above XML
soup = BeautifulSoup(data)
# collect the tags of interest into lists. can it be done wiser?
time_l = []
symb_l = []
temp_l = []
for i in soup.findAll('time'):
    i_time = str(i.get('from'))
    time_l.append(i_time)
for i in soup.findAll('symbol'):
    i_symb = str(i.get('name'))
    symb_l.append(i_symb)
for i in soup.findAll('temperature'):
    i_temp = str(i.get('value'))
    temp_l.append(i_temp)
# join the forecast lists to a dict
forc_l = []
for i, j in zip(symb_l, temp_l):
    forc_l.append([i, j])
rez = dict(zip(time_l, forc_l))
# combine and format the rezult. can this dict be printed simpler?
wew = ''
for key in sorted(rez):
    wew += re.sub("T", ", ", key) + str(rez[key])
wew = re.sub("'", "", wew)
wew = re.sub("\[", ": ", wew)
wew = re.sub("\]", "°\n", wew)
# print the rezult
print wew

But I imagine there must be some better, more intelligent approach? Mostly, I'm interested in collecting the attributes from the XML, my way seems rather dumb to me, actually. Also, is there any simpler way to print out a dict {'a': '[b, c]'} nicely?

Would be grateful for any hints or suggestions.

Upvotes: 2

Views: 2023

Answers (3)

Rachit kapadia
Rachit kapadia

Reputation: 701

One more, also you can fetch xml data by importing xml.dom.minidom module. Here is the data you want:

from xml.dom.minidom import parse
doc = parse("path/to/xmlfile.xml") # parse an XML file by name
itemlist = doc.getElementsByTagName('time')
for items in itemlist:
    from_tag =  items.getAttribute('from')    
    symbol_list = items.getElementsByTagName('symbol') 
    symbol_name = [d.getAttribute('name') for d in symbol_list ][0] 
    temperature_list = items.getElementsByTagName('temperature') 
    temp_value = [d.getAttribute('value') for d in temperature_list ][0]
    print ("{} :  {}, {}°". format(from_tag, symbol_name, temp_value))

Output will be as follows:

2017-07-29T08:00:00 :  Cloudy, 15°
2017-07-29T09:00:00 :  Partly cloudy, 17°

Hope it is useful.

Upvotes: 2

Gahan
Gahan

Reputation: 4213

Here you can also use an alternate way using builtin module(i'm using python 3.6.2):

import xml.etree.ElementTree as et # this is built-in module in python3
tree = et.parse("sample.xml")
root = tree.getroot()
for temp in root.iter("time"): # iterate time element in xml
    print(temp.attrib["from"], end=": ") # prints attribute of time element
    for sym in temp.iter("symbol"): # iterate symbol element within time element
        print(sym.attrib["name"], end=", ")
    for t in temp.iter("temperature"): # iterate temperature element within time element
        print(t.attrib["value"], end="°\n")

Upvotes: 1

Gahan
Gahan

Reputation: 4213

from bs4 import BeautifulSoup
with open("sample.xml", "r") as f: # opening xml file
    content = f.read() # xml content stored in this variable
soup = BeautifulSoup(content, "lxml")
for values in soup.findAll("time"):
    print("{} : {}, {}°".format(values["from"], values.find("symbol")["name"], values.find("temperature")["value"]))

Output:

2017-07-29T08:00:00 : Cloudy, 15°
2017-07-29T09:00:00 : Partly cloudy, 17°

Upvotes: 4

Related Questions