Reputation: 4827
I just started with Beautifulsoup and would like to extract the variables name
, brand
and price
from website http://www.mediamarkt.nl/nl/category/_laptops-482723.html but do not get things working.
<script> var product1511322 = {"name":"ACER Aspire 3 A315-31-C3PK","id":"1511322","price":"399.00","brand":"ACER","ean":"4713883258289","dimension25":"InStock","dimension26":1.99,"dimension24":21.00,"category":"Computer","dimension9":"Laptops","dimension10":"Windows-laptops"}; </script>
I tried...
from bs4 import BeautifulSoup
import requests
url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
script = soup.find_all('script')
script.find_all('var')
...but that doesn't work
Anyone suggestions how to extract all name
, brand
and price
information to a list of dataframe?
Upvotes: 0
Views: 1323
Reputation: 8047
I just noticed you wanted a "list of dataframe". This gets a list
, if you really want a "dataframe", that should be easy to adopt from this result.
from bs4 import BeautifulSoup
import requests
import ast # abstract syntax tree to parse dictionary text
url = 'http://www.mediamarkt.nl/nl/category/_laptops-482723.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
scripts = soup.find_all('script')
infos = []
for s in scripts:
if 'var product' in s.text[0:12]: # find the script of interest
d = s.text.split(' = ')[1].strip(';') # get the product information
# parse information as dictionary text
data = ast.literal_eval(d)
infos.append(data)
# Here's the list
# print infos # [{'category': 'Computer', 'name': 'HP Pavilion X360 14-BA081ND', ... 'dimension9': 'Laptops', 'dimension10': 'Windows-laptops', 'brand': 'LENOVO'}]
# for i in infos:
# print i['name'] # HP Pavilion X360 14-BA081ND
# print i['brand'] # HP
# print i['price'] # 629.00
There's probably a better way, but hope this helps.
Upvotes: 2