Reputation: 11
I am trying to exatract some numbers from a graph on this page ( https://www.zoopla.co.uk/local-info/?outcode=cm15&incode=9bq )
There are 5 tabs in that graph.
I am interested in the 5th tab (Newspapers).
When I run this piece of code, I get some info about the first tabbed graph.
but soup.find_all('',id='neighbours-newspapers') returns a blank.
from bs4 import BeautifulSoup as bs
import requests
res=requests.get('https://www.zoopla.co.uk/local-info/?outcode=cm15&incode=9bq')
soup = bs(res.content, 'lxml')
housing = [item.text.replace('\n','').strip() for item in soup.find_all('',id='local-info-neighbours')]
print(housing)
newspapers = [item.text.replace('\n','').strip() for item in soup.find_all('',id='neighbours-newspapers')]
print(newspapers)
I am not sure how to access an id within an id if that's what it is. Could someone help please?
Upvotes: 0
Views: 35
Reputation: 84465
You can use regex and requests
import requests
import re
import ast
headers = {
'Referer' : 'https://www.zoopla.co.uk/',
'User-Agent' : 'Mozilla/5.0'
}
res = requests.get('https://www.zoopla.co.uk/widgets/local-info/neighbours-chart.html?outcode=cm15&incode=9bq&category=Newspapers', headers = headers)
data = re.search(r'categories: (\[.*])', res.text ,flags=re.DOTALL).group(1)
items = re.findall(r'(\[.*])', data)
papers = ast.literal_eval(items[0])
numbers = ast.literal_eval(items[1])
result = list(zip(papers, numbers))
print(result)
Upvotes: 1