Vishaal Sudarsan
Vishaal Sudarsan

Reputation: 121

Parsing a Table from the following website

I want to collect the past weather details of a particular city in India for each day in the year 2016.The following website has this data :

"https://www.timeanddate.com/weather/india/kanpur/historic?month=1&year=2016"

This link has the data for month January 2016. There is a nice table out there

I want to extract this table

I have tried enough and I could extract another table which is this one. BUT I DO NOT WANT THIS ONE. It is not serving my purpose

I want the other big table with data given with time. "For each day of that month" because then I can loop over all months using the URL.

The problem is I do not know html and stuffs related to it. So I am not able to scrape out things myself.

Upvotes: 1

Views: 260

Answers (1)

Vineet Chaurasiya
Vineet Chaurasiya

Reputation: 105

It would have been better if you had provided some codes that you tried. Anyway, this code works for the 1st Jan table. You can write the loop to extract data for other days as well.

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.timeanddate.com/weather/india/kanpur/historic?
month=1&year=2016"
page = urlopen(url)
soup = BeautifulSoup(page, 'lxml')

Data = []
table = soup.find('table', attrs={'id':'wt-his'})
for tr in table.find('tbody').find_all('tr'):
   dict = {}
   dict['time'] = tr.find('th').text.strip()
   all_td = tr.find_all('td')
   dict['temp'] = all_td[1].text
   dict['weather'] = all_td[2].text
   dict['wind'] = all_td[3].text
   arrow = all_td[4].text
   if arrow == '↑':
      dict['wind_dir'] = 'South to North'
   else: 
      dict['wind_dir'] = 'North to South'

   dict['humidity'] = all_td[5].text
   dict['barometer'] = all_td[6].text
   dict['visibility'] = all_td[7].text

   Data.append(dict)

Note: add other cases for the wind_dir logic

Upvotes: 1

Related Questions