plzhelpmi
plzhelpmi

Reputation: 37

Beautifulsoup loop through HTML

As mentioned in the previous question, I am using Beautiful soup with python to retrieve weather data from a website.

Here's how the website looks like:

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

I managed to retrieve forecastIssue date & validTime. However, I am not able to retrieve the different area forecast.

Here are my python codes :

import requests
from bs4 import BeautifulSoup
import urllib3

outfile = open('C:\scripts\idk.xml','w')

#getting the time

r = requests.get('http://www.nea.gov.sg/api/WebAPI/?   
dataset=2hr_nowcast&keyref=<keyrefno>')
soup = BeautifulSoup(r.content, "xml")
time = soup.find('validTime').string
print time

#print issue date and time
for currentdate in soup.findAll('item'):
string = currentdate.find('forecastIssue')
print string

This is the part where I want to retrieve area forecast eg. area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/

for area in soup.findAll('weatherForecast'):
areastring = area.find('area')
print areastring

When I run my codes in python, it only retrieved the first area which is Ang Mo Kio

Sample output:

2.30 pm to 5.30 pm
<forecastIssue date="22-07-2016" time="02:30 PM"/>
<area forecast="RA" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>

Inspect element of the website

As you can see, area forecast is within div class

  1. How do I loop through all the areas? I've tried googling but apparently findAll doesn't seem to work for my codes

  2. Is there any way to split the date and time?

  3. Is there any way I can parse data retrieved by beautifulsoup into a xml file? As my output doesn't contain any data when I run the codes.

Thank you.

Upvotes: 0

Views: 4213

Answers (2)

Ilja Everil&#228;
Ilja Everil&#228;

Reputation: 52929

When I run my codes in python, it only retrieved the first area which is Ang Mo Kio

findAll('weatherForecast') will return a sequence of one element, given provided XML. You then proceed to iterate through this sequence and use find('area'), which stops after finding 1 element and returns that, if any. To find all the area elements in weatherForecast:

for area in soup.find('weatherForecast').find_all('area'):
    print area

Is there any way to split the date and time?

Not entirely sure what you mean, perhaps you want to extract the values from the element:

for currentdate in soup.find_all('item'):
    element = currentdate.find('forecastIssue')
    print element['date'], element['time']

Upvotes: 2

akash karothiya
akash karothiya

Reputation: 5950

1.To loop through all the areas,

areas = soup.select('area')
for data in areas:
    print(data.get('name'))

Output

Ang Mo Kio
Bedok
Bishan
Boon Lay
Bukit Batok
Bukit Merah

2.You can individually extact data as well

date = soup.select('forecastissue')[0].get('date')
time = soup.select('forecastissue')[0].get('time')

Upvotes: 2

Related Questions