Python web scraping with Beauifulsoup to extract variables

Question

I just started with Beautifulsoup and would like to extract the variables name, brand and price from website http://www.mediamarkt.nl/nl/category/_laptops-482723.html but do not get things working.

I tried...

from bs4 import BeautifulSoup
import requests

url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

script = soup.find_all('script')

script.find_all('var')

...but that doesn't work

Anyone suggestions how to extract all name, brand and price information to a list of dataframe?

chickity china chinese chicken · Accepted Answer

I just noticed you wanted a "list of dataframe". This gets a list, if you really want a "dataframe", that should be easy to adopt from this result.

from bs4 import BeautifulSoup
import requests
import ast  # abstract syntax tree to parse dictionary text

url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

scripts = soup.find_all('script')
infos = []

for s in scripts:
    if 'var product' in s.text[0:12]:          # find the script of interest
        d = s.text.split(' = ')[1].strip(';')  # get the product information
        # parse information as dictionary text
        data = ast.literal_eval(d)

        infos.append(data)

# Here's the list
# print infos  #  [{'category': 'Computer', 'name': 'HP Pavilion X360 14-BA081ND', ... 'dimension9': 'Laptops', 'dimension10': 'Windows-laptops', 'brand': 'LENOVO'}]

# for i in infos:
#     print i['name']   # HP Pavilion X360 14-BA081ND
#     print i['brand']  # HP
#     print i['price']  # 629.00

There's probably a better way, but hope this helps.

Python web scraping with Beauifulsoup to extract variables

Answers (1)

Related Questions