Reputation:

How can I scrape supermarket nutrient data with Python's requests?

I want to scrape nutrient data from this page: http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html.

I tried the following code.

import requests, bs4
res = requests.get('http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html')

But the response's text does not match the HTML I see when I inspect the page with a browser.

So I can't use Beautiful Soup to search it.

How can I fix this?

Upvotes: 0

Answers (3)

HimanshuGahlot

Reputation: 571

This is the issue which arises due to different orientation of html tags when viewed on different browsers. This comes into picture due to different User Agent for each browser.

If you want to see same text as in browser then use Selenium Webdriver. Its is very easy and convenient to use it. Just once you are finished take the source code and use Beautiful Soup on that.

If you want to learn how to implement selenium check out here

Even facing problem then feel free to contact.

Upvotes: 0

T.S.

Reputation: 302

A good alternative would be to use the newly released requests-HTML library by the same author of requests.

That way you can parse HTML as simple as this:

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('https://python.org/')
sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p'

print(r.html.find(sel, first=True).text)

Check it out at the official site.

Thank you.

Upvotes: 2

Ulf Aslak

Reputation: 8628

You need to retrieve the markup from the .text attribute of the res object. Your code should then read:

import requests, bs4
res = requests.get('http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html')
html = res.text

Upvotes: 0

How can I scrape supermarket nutrient data with Python&#39;s requests?

Answers (3)

Related Questions

How can I scrape supermarket nutrient data with Python's requests?