Hamza Waheed
Hamza Waheed

Reputation: 155

BeautifulSoup or requests not reading some section of a webpage

I'm new to web scraping and having some trouble getting data from a webpage.

I'm trying to read this web page: https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018

and trying to get the wind speed data through a div element with the class: wstext, but for some reason the page that requests library gets through the internet does not contain this particular class and some of its ancestors.

import requests
import bs4 as bs
import numpy as np

wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)

html = requests.get('https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018')

print(html.text)

soup = bs.BeautifulSoup(html.content, 'html5lib')

print(soup.prettify)

windList = soup.findAll('div')
print(windList)

I've tried printing the html data requests read directly and the after parsing it through beautifulsoup to see if the html data contained that class but I couldn't find anything. Any help would be greatly appreciated.

Upvotes: 2

Views: 111

Answers (2)

Alex Yu
Alex Yu

Reputation: 3537

My exploration and very-very dirty "kind of solution" for problem

1. BeautifulSoap is just fine

Look at pandas solution - it works just fine.

Look at pandas source - we see that pandas is using _BeautifulSoupHtml5LibFrameParser.

Ergo: BeautifulSoup is fine.

2. "Nitty-gritty dirty kinda solution" with curl

Let's try curl:

$ curl https://www.timeanddate.com/weather/pakistan/lahore/historic\?month\=7\&year\=2018 > result.html   
$ less result.html

What we see here:

</script><script type="text/javascript">
var data={"copyright":"Contents are strictly for use by 
timeanddate.com","units": 
{"temp":"°C","prec":"mm","wind":"km\/h","baro":"mbar"},
"temp":        
[{"date":15304047E5,"temp":29},{"date":15304065E5,"temp":29},  
{"date":15304083E5,"temp":29},{"date":15304101E5,"temp":28},
...

I suppose it's the data that OP looking for.

3. Possible solution

  1. Download url in some way or another. curl/wget/requests - everything must be fine
  2. From downloaded html extract var data. Python str-methods must be sufficient
  3. json.loads this extracted data
  4. Finish

Beauty in such solution - data cames as is without decoding from html <table>.

P.S.

Personaly I like pandas-solution.

Because pandas is great library itself.

But pandas is not needed to solve this problem.

Upvotes: 2

chitown88
chitown88

Reputation: 28565

Pandas can do the work for you rather than using bs4 or requests:

import numpy as np
import pandas as pd

wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)

url = 'https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018'

tables = pd.read_html(url)

table = tables[1]

print (table.iloc[:,4])

Output:

print (table.iloc[:,4])
0       3 mph
1     No wind
2     No wind
3     No wind
4     No wind
5     No wind
6     No wind
7       3 mph
8       5 mph
9       6 mph
10      5 mph
11      5 mph
12      6 mph
13      5 mph
14    No wind
15      3 mph
16    No wind
17    No wind
18    No wind
19    No wind
20      5 mph
21    No wind
22      6 mph
23      6 mph
24      5 mph
25      6 mph
26      7 mph
27      7 mph
28      7 mph
29      3 mph
30      3 mph
31      3 mph
32      3 mph
33    No wind
34      3 mph
35      3 mph
36    No wind
37    No wind
38        NaN
Name: (Unnamed: 4_level_0, Wind), dtype: object

Option 2:

You can find and pull the json structure in the html and then work with that. When I tried that though, it has the data spanned out for the month, rather than the single day, by the hour:

import numpy as np
import requests
import bs4
import json

wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)

url = 'https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018'

response = requests.get(url)

soup = bs4.BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script')
jsonObj = None

for script in scripts:
    if 'var data='  in script.text:
        jsonStr = script.text.strip()

        jsonStr = jsonStr.split('var data=')[1]
        jsonStr = jsonStr.split(';')[0]

        jsonObj = json.loads(jsonStr)

for item in jsonObj['detail']:
    date = item['ds']
    wind = item['wind']

    print ('Date: %-40s   Wind: %s' %(date,wind) )

Output:

Date: Sunday, 1 July 2018, 00:00 — 06:00         Wind: 0.621
Date: Sunday, 1 July 2018, 06:00 — 12:00         Wind: 3.728
Date: Sunday, 1 July 2018, 12:00 — 18:00         Wind: 3.107
Date: Sunday, 1 July 2018, 18:00 — 00:00         Wind: 3.107
Date: Monday, 2 July 2018, 00:00 — 06:00         Wind: 1.864
Date: Monday, 2 July 2018, 06:00 — 12:00         Wind: 5.593
Date: Monday, 2 July 2018, 12:00 — 18:00         Wind: 8.7
Date: Monday, 2 July 2018, 18:00 — 00:00         Wind: 9.943
Date: Tuesday, 3 July 2018, 00:00 — 06:00        Wind: 10.564
Date: Tuesday, 3 July 2018, 06:00 — 12:00        Wind: 11.185
Date: Tuesday, 3 July 2018, 12:00 — 18:00        Wind: 9.943
Date: Tuesday, 3 July 2018, 18:00 — 00:00        Wind: 6.214
Date: Wednesday, 4 July 2018, 00:00 — 06:00      Wind: 6.836
Date: Wednesday, 4 July 2018, 06:00 — 12:00      Wind: 4.971
Date: Wednesday, 4 July 2018, 12:00 — 18:00      Wind: 6.214
Date: Wednesday, 4 July 2018, 18:00 — 00:00      Wind: 3.728
Date: Thursday, 5 July 2018, 00:00 — 06:00       Wind: 1.864
Date: Thursday, 5 July 2018, 06:00 — 12:00       Wind: 1.864
Date: Thursday, 5 July 2018, 12:00 — 18:00       Wind: 3.107
Date: Thursday, 5 July 2018, 18:00 — 00:00       Wind: 3.107
Date: Friday, 6 July 2018, 00:00 — 06:00         Wind: 1.864
Date: Friday, 6 July 2018, 06:00 — 12:00         Wind: 6.214
Date: Friday, 6 July 2018, 12:00 — 18:00         Wind: 6.836
Date: Friday, 6 July 2018, 18:00 — 00:00         Wind: 3.728
Date: Saturday, 7 July 2018, 00:00 — 06:00       Wind: 1.243
Date: Saturday, 7 July 2018, 06:00 — 12:00       Wind: 2.486
Date: Saturday, 7 July 2018, 12:00 — 18:00       Wind: 6.836
Date: Saturday, 7 July 2018, 18:00 — 00:00       Wind: 2.486
Date: Sunday, 8 July 2018, 00:00 — 06:00         Wind: 3.107
Date: Sunday, 8 July 2018, 06:00 — 12:00         Wind: 6.214
Date: Sunday, 8 July 2018, 12:00 — 18:00         Wind: 5.593
Date: Sunday, 8 July 2018, 18:00 — 00:00         Wind: 4.35
Date: Monday, 9 July 2018, 00:00 — 06:00         Wind: 5.593
Date: Monday, 9 July 2018, 06:00 — 12:00         Wind: 5.593
Date: Monday, 9 July 2018, 12:00 — 18:00         Wind: 6.214
Date: Monday, 9 July 2018, 18:00 — 00:00         Wind: 4.35
Date: Tuesday, 10 July 2018, 00:00 — 06:00       Wind: 6.836
Date: Tuesday, 10 July 2018, 06:00 — 12:00       Wind: 8.078
Date: Tuesday, 10 July 2018, 12:00 — 18:00       Wind: 6.836
Date: Tuesday, 10 July 2018, 18:00 — 00:00       Wind: 5.593
Date: Wednesday, 11 July 2018, 00:00 — 06:00     Wind: 6.214
Date: Wednesday, 11 July 2018, 06:00 — 12:00     Wind: 12.428
Date: Wednesday, 11 July 2018, 12:00 — 18:00     Wind: 8.078
Date: Wednesday, 11 July 2018, 18:00 — 00:00     Wind: 5.593
Date: Thursday, 12 July 2018, 00:00 — 06:00      Wind: 4.971
Date: Thursday, 12 July 2018, 06:00 — 12:00      Wind: 8.078
Date: Thursday, 12 July 2018, 12:00 — 18:00      Wind: 7.457
Date: Thursday, 12 July 2018, 18:00 — 00:00      Wind: 6.214
Date: Friday, 13 July 2018, 00:00 — 06:00        Wind: 5.593
Date: Friday, 13 July 2018, 06:00 — 12:00        Wind: 11.807
Date: Friday, 13 July 2018, 12:00 — 18:00        Wind: 9.321
Date: Friday, 13 July 2018, 18:00 — 00:00        Wind: 5.593
Date: Saturday, 14 July 2018, 00:00 — 06:00      Wind: 4.971
Date: Saturday, 14 July 2018, 06:00 — 12:00      Wind: 4.971
Date: Saturday, 14 July 2018, 12:00 — 18:00      Wind: 6.214
Date: Saturday, 14 July 2018, 18:00 — 00:00      Wind: 6.214
Date: Sunday, 15 July 2018, 00:00 — 06:00        Wind: 8.7
Date: Sunday, 15 July 2018, 06:00 — 12:00        Wind: 8.7
Date: Sunday, 15 July 2018, 12:00 — 18:00        Wind: 8.7
Date: Sunday, 15 July 2018, 18:00 — 00:00        Wind: 5.593
Date: Monday, 16 July 2018, 00:00 — 06:00        Wind: 4.971
Date: Monday, 16 July 2018, 06:00 — 12:00        Wind: 11.185
Date: Monday, 16 July 2018, 12:00 — 18:00        Wind: 11.185
Date: Monday, 16 July 2018, 18:00 — 00:00        Wind: 8.7
Date: Tuesday, 17 July 2018, 00:00 — 06:00       Wind: 7.457
Date: Tuesday, 17 July 2018, 06:00 — 12:00       Wind: 8.078
Date: Tuesday, 17 July 2018, 12:00 — 18:00       Wind: 6.836
Date: Tuesday, 17 July 2018, 18:00 — 00:00       Wind: 4.971
Date: Wednesday, 18 July 2018, 00:00 — 06:00     Wind: 3.728
Date: Wednesday, 18 July 2018, 06:00 — 12:00     Wind: 2.486
Date: Wednesday, 18 July 2018, 12:00 — 18:00     Wind: 6.214
Date: Wednesday, 18 July 2018, 18:00 — 00:00     Wind: 4.971
Date: Thursday, 19 July 2018, 00:00 — 06:00      Wind: 4.971
Date: Thursday, 19 July 2018, 06:00 — 12:00      Wind: 5.593
Date: Thursday, 19 July 2018, 12:00 — 18:00      Wind: 6.214
Date: Thursday, 19 July 2018, 18:00 — 00:00      Wind: 1.864
Date: Friday, 20 July 2018, 00:00 — 06:00        Wind: 2.486
Date: Friday, 20 July 2018, 06:00 — 12:00        Wind: 5.593
Date: Friday, 20 July 2018, 12:00 — 18:00        Wind: 8.078
Date: Friday, 20 July 2018, 18:00 — 00:00        Wind: 3.728
Date: Saturday, 21 July 2018, 00:00 — 06:00      Wind: 0.621
Date: Saturday, 21 July 2018, 06:00 — 12:00      Wind: 1.243
Date: Saturday, 21 July 2018, 12:00 — 18:00      Wind: 2.486
Date: Saturday, 21 July 2018, 18:00 — 00:00      Wind: 7.457
Date: Sunday, 22 July 2018, 00:00 — 06:00        Wind: 4.971
Date: Sunday, 22 July 2018, 06:00 — 12:00        Wind: 6.836
Date: Sunday, 22 July 2018, 12:00 — 18:00        Wind: 4.35
Date: Sunday, 22 July 2018, 18:00 — 00:00        Wind: 4.35
Date: Monday, 23 July 2018, 00:00 — 06:00        Wind: 2.486
Date: Monday, 23 July 2018, 06:00 — 12:00        Wind: 6.214
Date: Monday, 23 July 2018, 12:00 — 18:00        Wind: 6.836
Date: Monday, 23 July 2018, 18:00 — 00:00        Wind: 4.971
Date: Tuesday, 24 July 2018, 00:00 — 06:00       Wind: 3.107
Date: Tuesday, 24 July 2018, 06:00 — 12:00       Wind: 7.457
Date: Tuesday, 24 July 2018, 12:00 — 18:00       Wind: 4.35
Date: Tuesday, 24 July 2018, 18:00 — 00:00       Wind: 2.486
Date: Wednesday, 25 July 2018, 00:00 — 06:00     Wind: 1.243
Date: Wednesday, 25 July 2018, 06:00 — 12:00     Wind: 3.728
Date: Wednesday, 25 July 2018, 12:00 — 18:00     Wind: 6.836
Date: Wednesday, 25 July 2018, 18:00 — 00:00     Wind: 7.457
Date: Thursday, 26 July 2018, 00:00 — 06:00      Wind: 7.457
Date: Thursday, 26 July 2018, 06:00 — 12:00      Wind: 9.321
Date: Thursday, 26 July 2018, 12:00 — 18:00      Wind: 11.185
Date: Thursday, 26 July 2018, 18:00 — 00:00      Wind: 7.457
Date: Friday, 27 July 2018, 00:00 — 06:00        Wind: 6.836
Date: Friday, 27 July 2018, 06:00 — 12:00        Wind: 5.593
Date: Friday, 27 July 2018, 12:00 — 18:00        Wind: 4.35
Date: Friday, 27 July 2018, 18:00 — 00:00        Wind: 4.35
Date: Saturday, 28 July 2018, 00:00 — 06:00      Wind: 3.728
Date: Saturday, 28 July 2018, 06:00 — 12:00      Wind: 6.214
Date: Saturday, 28 July 2018, 12:00 — 18:00      Wind: 1.864
Date: Saturday, 28 July 2018, 18:00 — 00:00      Wind: 3.728
Date: Sunday, 29 July 2018, 00:00 — 06:00        Wind: 3.107
Date: Sunday, 29 July 2018, 06:00 — 12:00        Wind: 6.836
Date: Sunday, 29 July 2018, 12:00 — 18:00        Wind: 5.593
Date: Sunday, 29 July 2018, 18:00 — 00:00        Wind: 2.486
Date: Monday, 30 July 2018, 00:00 — 06:00        Wind: 1.864
Date: Monday, 30 July 2018, 06:00 — 12:00        Wind: 3.728
Date: Monday, 30 July 2018, 12:00 — 18:00        Wind: 4.971
Date: Monday, 30 July 2018, 18:00 — 00:00        Wind: 2.486
Date: Tuesday, 31 July 2018, 00:00 — 06:00       Wind: 1.243
Date: Tuesday, 31 July 2018, 06:00 — 12:00       Wind: 6.836
Date: Tuesday, 31 July 2018, 12:00 — 18:00       Wind: 6.836
Date: Tuesday, 31 July 2018, 18:00 — 00:00       Wind: 3.107

Here's the breakdown of the json format to get to wind

enter image description here

Upvotes: 3

Related Questions