Reputation: 549
I have a problem with this Python script. I'm attempting to pass the values from a list that has home strings in it. I've attached the script. In this command page = requests.get("https://www.google.dz/search?q=lista[url]")
I have to put what I'm looking for on google after the search?q=
. I want to search multiple keyword, so I made a list. I don't how to pass the values from the list in that command...
import requests
import re
from bs4 import BeautifulSoup
lista = []
lista.append("Samsung S9")
lista.append("Samsung S8")
lista.append("Samsung Note 9")
list_scrape = []
for url in lista:
page = requests.get("https://www.google.dz/search?q=lista[url]")
soup = BeautifulSoup(page.content)
links = soup.findAll("a")
for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)
(htt.*://.*)")):
list_scrape.append(re.split(":(?=http)",link["href"].replace("/url?q=","")))
print(list_scrape)
Thank you!
Upvotes: 1
Views: 1748
Reputation: 1724
You can use f
-string instead which more pythonic way in my opinion to do string
formatting:
requests.get(f"https://www.google.dz/search?q={url}")
# or
for query in queries:
html = requests.get(f"https://www.google.dz/search?q={query}")
Note that the next problem might appear because of no user-agent
specified thus Google blocked your request.
Because the default requests
user-agent
is python-requests. Google understands it and blocks a request since it's not the "real" user visit. Checks what's your user-agent.
Code:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]
for query in queries:
params = {
"q": query,
"gl": "uk",
"hl": "en"
}
html = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(html.text, "lxml")
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
print(f"{title}\n{link}\n")
-------
'''
Samsung Galaxy S9 and S9+ | Buy or See Specs
https://www.samsung.com/uk/smartphones/galaxy-s9/
Samsung Galaxy S9 - Full phone specifications - GSMArena ...
https://www.gsmarena.com/samsung_galaxy_s9-8966.php
...
Samsung Galaxy S8 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_S8
Samsung Galaxy S8 Price in India - Gadgets 360
https://gadgets.ndtv.com/samsung-galaxy-s8-4009
...
Samsung Galaxy Note 9 Cases - Mobile Fun
https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases
Samsung Galaxy Note 9 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
'''
Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't need to think about how to extract certain things or figure out why something isn't working as it should work. All that really needs to be done is to iterate over structured JSON and get the data you want fast without any headache.
Code to integrate:
import os
from serpapi import GoogleSearch
queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]
for query in queries:
params = {
"engine": "google",
"q": query,
"hl": "en",
"gl": "uk",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
print()
------
'''
Samsung Galaxy S9 and S9+ | Buy or See Specs
https://www.samsung.com/uk/smartphones/galaxy-s9/
Samsung Galaxy S9 - Full phone specifications - GSMArena ...
https://www.gsmarena.com/samsung_galaxy_s9-8966.php
...
Samsung Galaxy S8 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_S8
Samsung Galaxy S8 Price in India - Gadgets 360
https://gadgets.ndtv.com/samsung-galaxy-s8-4009
...
Samsung Galaxy Note 9 Cases - Mobile Fun
https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases
Samsung Galaxy Note 9 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
'''
Disclaimer, I work for SerpApi.
Upvotes: 1
Reputation: 133
try this..
for url in lista:
page = requests.get("https://www.google.dz/search?q="+url)
or
page = requests.get("https://www.google.dz/search?q={}".format(url))
Upvotes: 1
Reputation: 3669
Use format
for url in lista:
page = requests.get("https://www.google.dz/search?q={}".format(url))
Or
page = requests.get("https://www.google.dz/search?q=%s" % url)
Upvotes: 2