user9510596
user9510596

Reputation: 41

Web scraping with Selenium + Python

The objective is to scrape the historical weather from http://www.weather.gov.sg/climate-historical-daily/

To obtain the data for the particular month, first have to select the cityname, month and year

There are 63 cities,12 months and 41 years

city = [el.text for el in driver.find_elements_by_xpath("/html/body/div/div/div[3]/div[1]/div[1]/div/div/ul/li/a")]
len(city)
Out[182]: 63

month = [el.text for el in driver.find_elements_by_xpath('//*[@id="monthDiv"]/ul/li')]
year = [el.text for el in driver.find_elements_by_xpath('//*[@id="yearDiv"]/ul/li')]

click display button

button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "display")))
button.click()

How to select option from these bootstrap drop downlist and copy the weather data in

<table class="table table-calendar"><colgroup>
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
                <col width="10%">
              </colgroup><thead><tr><th>Date</th><th>Daily Rainfall Total (mm)</th><th>Highest &nbsp;30-min Rainfall (mm)</th><th>Highest &nbsp;60-min Rainfall (mm)</th><th>Highest 120-min Rainfall (mm)</th><th>Mean Temperature (°C)</th><th>Maximum Temperature (°C)</th><th>Minimum Temperature (°C)</th><th>Mean Wind Speed (km/h)</th><th>Max Wind Speed (km/h)</th></tr></thead><tbody><tr><td>1 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.5</td><td align="center">30.4</td><td align="center">26.0</td><td align="center">12.3</td><td align="center">40.7</td></tr><tr><td>2 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.9</td><td align="center">31.7</td><td align="center">26.9</td><td align="center">10.3</td><td align="center">31.5</td></tr><tr><td>3 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.2</td><td align="center">31.7</td><td align="center">27.2</td><td align="center">12.0</td><td align="center">31.5</td></tr><tr><td>4 Aug</td><td align="center">4.8</td><td align="center">4.6</td><td align="center">4.8</td><td align="center">4.8</td><td align="center">27.9</td><td align="center">30.2</td><td align="center">24.1</td><td align="center">8.8</td><td align="center">44.4</td></tr><tr><td>5 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.8</td><td align="center">31.8</td><td align="center">26.7</td><td align="center">8.6</td><td align="center">25.9</td></tr><tr><td>6 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.2</td><td align="center">31.4</td><td align="center">27.6</td><td align="center">8.1</td><td align="center">27.8</td></tr><tr><td>7 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.4</td><td align="center">32.7</td><td align="center">27.3</td><td align="center">11.4</td><td align="center">29.6</td></tr><tr><td>8 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.7</td><td align="center">32.9</td><td align="center">27.6</td><td align="center">11.0</td><td align="center">27.8</td></tr><tr><td>9 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.6</td><td align="center">32.8</td><td align="center">27.7</td><td align="center">12.3</td><td align="center">31.5</td></tr><tr><td>10 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.7</td><td align="center">33.0</td><td align="center">27.8</td><td align="center">12.9</td><td align="center">33.3</td></tr><tr><td>11 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.5</td><td align="center">32.7</td><td align="center">28.2</td><td align="center">11.0</td><td align="center">31.5</td></tr><tr><td>12 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">27.9</td><td align="center">30.0</td><td align="center">26.8</td><td align="center">8.7</td><td align="center">31.5</td></tr><tr><td>13 Aug</td><td align="center">34.6</td><td align="center">22.2</td><td align="center">30.8</td><td align="center">33.4</td><td align="center">28.3</td><td align="center">32.2</td><td align="center">22.5</td><td align="center">6.4</td><td align="center">40.7</td></tr><tr><td>14 Aug</td><td align="center">13.8</td><td align="center">7.2</td><td align="center">12.2</td><td align="center">12.6</td><td align="center">25.9</td><td align="center">28.5</td><td align="center">23.4</td><td align="center">5.1</td><td align="center">35.2</td></tr><tr><td>15 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.0</td><td align="center">31.5</td><td align="center">24.6</td><td align="center">6.5</td><td align="center">25.9</td></tr><tr><td>16 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.0</td><td align="center">30.0</td><td align="center">26.4</td><td align="center">8.0</td><td align="center">27.8</td></tr><tr><td>17 Aug</td><td align="center">5.2</td><td align="center">4.0</td><td align="center">4.6</td><td align="center">4.6</td><td align="center">27.4</td><td align="center">31.4</td><td align="center">24.3</td><td align="center">6.2</td><td align="center">29.6</td></tr><tr><td>18 Aug</td><td align="center">2.0</td><td align="center">1.0</td><td align="center">1.0</td><td align="center">2.0</td><td align="center">27.1</td><td align="center">30.1</td><td align="center">25.3</td><td align="center">6.4</td><td align="center">48.2</td></tr><tr><td>19 Aug</td><td align="center">1.8</td><td align="center">1.4</td><td align="center">1.6</td><td align="center">1.8</td><td align="center">28.0</td><td align="center">31.3</td><td align="center">25.4</td><td align="center">5.7</td><td align="center">25.9</td></tr><tr><td>20 Aug</td><td align="center">2.2</td><td align="center">2.0</td><td align="center">2.0</td><td align="center">2.0</td><td align="center">28.1</td><td align="center">31.9</td><td align="center">25.5</td><td align="center">10.6</td><td align="center">37.0</td></tr><tr><td>21 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.6</td><td align="center">33.0</td><td align="center">27.7</td><td align="center">15.2</td><td align="center">31.5</td></tr><tr><td>22 Aug</td><td align="center">2.0</td><td align="center">1.4</td><td align="center">1.6</td><td align="center">1.6</td><td align="center">27.9</td><td align="center">32.1</td><td align="center">25.3</td><td align="center">9.3</td><td align="center">38.9</td></tr><tr><td>23 Aug</td><td align="center">24.4</td><td align="center">8.2</td><td align="center">11.2</td><td align="center">15.2</td><td align="center">25.6</td><td align="center">27.0</td><td align="center">23.0</td><td align="center">5.1</td><td align="center">48.2</td></tr><tr><td>24 Aug</td><td align="center">0.0</td><td align="center">0.2</td><td align="center">0.2</td><td align="center">0.2</td><td align="center">28.1</td><td align="center">32.4</td><td align="center">24.5</td><td align="center">9.0</td><td align="center">33.3</td></tr><tr><td>25 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">27.9</td><td align="center">31.9</td><td align="center">25.7</td><td align="center">8.6</td><td align="center">44.4</td></tr><tr><td>26 Aug</td><td align="center">4.6</td><td align="center">4.4</td><td align="center">4.6</td><td align="center">4.6</td><td align="center">27.0</td><td align="center">31.3</td><td align="center">24.0</td><td align="center">9.6</td><td align="center">51.9</td></tr><tr><td>27 Aug</td><td align="center">1.4</td><td align="center">1.4</td><td align="center">1.4</td><td align="center">1.4</td><td align="center">27.8</td><td align="center">30.4</td><td align="center">25.6</td><td align="center">8.4</td><td align="center">27.8</td></tr><tr><td>28 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.9</td><td align="center">32.3</td><td align="center">26.2</td><td align="center">9.6</td><td align="center">33.3</td></tr><tr><td>29 Aug</td><td align="center">6.6</td><td align="center">2.8</td><td align="center">3.4</td><td align="center">4.8</td><td align="center">27.2</td><td align="center">30.8</td><td align="center">25.1</td><td align="center">8.0</td><td align="center">-</td></tr><tr><td>30 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">28.6</td><td align="center">32.1</td><td align="center">26.4</td><td align="center">11.2</td><td align="center">35.2</td></tr><tr><td>31 Aug</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">0.0</td><td align="center">29.0</td><td align="center">32.2</td><td align="center">27.2</td><td align="center">11.7</td><td align="center">29.6</td></tr></tbody></table>

Upvotes: 1

Views: 134

Answers (2)

Virender Kamboj
Virender Kamboj

Reputation: 105

City, Month and Year are not drop downs. These are buttons, so can be handled using simple click operation.

Please try the below code to select city and use the same approach for Month and Year as well.

city_button=driver.find_element_by_id('cityname')  #Locate City

city_button.click()                                #Click on City List

Bukit_Timah=driver.find_element_by_xpath("//a[text()='Bukit Timah']") #Locate 'Bukit Timah' city

Bukit_Timah.click()  #Click on 'Bukit Timah' city from the list

Please refer the screenshot to understand the dom

Upvotes: 0

baduker
baduker

Reputation: 20042

Here's a different approach.

Why not get all the .csv files for all the cities and all the dates? The link to the file is static and uses the code of the city that's in the drop-down menu. You can parse this, grab the code, put it in the url and get the .csv file. Oh, and you have to loop over all the years too.

By the way, not all cities have data for the past 40 years.

import re
import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "PostmanRuntime/7.26.5",
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate, br",
}

response = requests.get("http://www.weather.gov.sg/climate-historical-daily/")

soup = BeautifulSoup(response.text, "html.parser").find("ul", {"class": "dropdown-menu long-dropdown"}).find_all("li")
cities_and_codes = {
    t.find("a").getText(strip=True): re.search(r'(S\d+)', t.find("a")['onclick']).group(1)
    for t in soup
}


def get_dates():
    yield from (
        [(y, f"0{m}" if m < 10 else m) for y in range(1980, 2021) for m in range(1, 13)]
    )


files_url = "http://www.weather.gov.sg/files/dailydata/DAILYDATA_"
for city, code in cities_and_codes.items():
    for date in get_dates():
        year, month = date
        csv_url = f"{files_url}{code}_{year}{month}.csv"
        response = requests.get(csv_url)
        if response.status_code == 200:
            print(f"Fetching data for {city} for {month}/{year}")
            print(f"Found data. Fetching {csv_url}")
            with open(f"{city.replace(' ', '_')}_{csv_url.split('/')[-1]}", "wb") as f:
                f.write(response.content)
        else:
            print(f"No data available for {city} for {month}/{year}...")
            continue

You can play around with this and just get the files for those cities you want, or all of them, but that might take a while.

Upvotes: 1

Related Questions