PyRar
PyRar

Reputation: 549

How to pass the value from a list in a url string

I have a problem with this Python script. I'm attempting to pass the values from a list that has home strings in it. I've attached the script. In this command page = requests.get("https://www.google.dz/search?q=lista[url]") I have to put what I'm looking for on google after the search?q=. I want to search multiple keyword, so I made a list. I don't how to pass the values from the list in that command...

import requests
import re
from bs4 import BeautifulSoup

lista = []
lista.append("Samsung S9")
lista.append("Samsung S8")
lista.append("Samsung Note 9")

list_scrape = []

for url in lista:
    page = requests.get("https://www.google.dz/search?q=lista[url]")
    soup = BeautifulSoup(page.content)
    links = soup.findAll("a")
    for link in  soup.find_all("a",href=re.compile("(?<=/url\?q=) 
    (htt.*://.*)")):
        list_scrape.append(re.split(":(?=http)",link["href"].replace("/url?q=","")))

print(list_scrape)

Thank you!

Upvotes: 1

Views: 1748

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

You can use f-string instead which more pythonic way in my opinion to do string formatting:

requests.get(f"https://www.google.dz/search?q={url}")
# or
for query in queries:
   html = requests.get(f"https://www.google.dz/search?q={query}")

Note that the next problem might appear because of no user-agent specified thus Google blocked your request.

Because the default requests user-agent is python-requests. Google understands it and blocks a request since it's not the "real" user visit. Checks what's your user-agent.


Code:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]

for query in queries:
  params = {
    "q": query,
    "gl": "uk",
    "hl": "en"
  }

  html = requests.get("https://www.google.com/search", headers=headers, params=params)
  soup = BeautifulSoup(html.text, "lxml")
  
  for result in soup.select('.tF2Cxc'):
    title = result.select_one('.DKV0Md').text
    link = result.select_one('.yuRUbf a')['href']

    print(f"{title}\n{link}\n")

-------
'''
Samsung Galaxy S9 and S9+ | Buy or See Specs
https://www.samsung.com/uk/smartphones/galaxy-s9/

Samsung Galaxy S9 - Full phone specifications - GSMArena ...
https://www.gsmarena.com/samsung_galaxy_s9-8966.php
...
Samsung Galaxy S8 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_S8

Samsung Galaxy S8 Price in India - Gadgets 360
https://gadgets.ndtv.com/samsung-galaxy-s8-4009
...
Samsung Galaxy Note 9 Cases - Mobile Fun
https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases

Samsung Galaxy Note 9 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
'''

Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you don't need to think about how to extract certain things or figure out why something isn't working as it should work. All that really needs to be done is to iterate over structured JSON and get the data you want fast without any headache.

Code to integrate:

import os
from serpapi import GoogleSearch

queries = ["Samsung S9", "Samsung S8", "Samsung Note 9"]

for query in queries:
  params = {
      "engine": "google",
      "q": query,
      "hl": "en",
      "gl": "uk",
      "api_key": os.getenv("API_KEY"),
  }

  search = GoogleSearch(params)
  results = search.get_dict()

  for result in results["organic_results"]:
    print(result['title'])
    print(result['link'])
    print()

------
'''
Samsung Galaxy S9 and S9+ | Buy or See Specs
https://www.samsung.com/uk/smartphones/galaxy-s9/

Samsung Galaxy S9 - Full phone specifications - GSMArena ...
https://www.gsmarena.com/samsung_galaxy_s9-8966.php
...
Samsung Galaxy S8 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_S8

Samsung Galaxy S8 Price in India - Gadgets 360
https://gadgets.ndtv.com/samsung-galaxy-s8-4009
...
Samsung Galaxy Note 9 Cases - Mobile Fun
https://www.mobilefun.co.uk/samsung/galaxy-note-9/cases

Samsung Galaxy Note 9 - Wikipedia
https://en.wikipedia.org/wiki/Samsung_Galaxy_Note_9
'''

Disclaimer, I work for SerpApi.

Upvotes: 1

Sushant Kathuria
Sushant Kathuria

Reputation: 133

try this..

for url in lista:
    page = requests.get("https://www.google.dz/search?q="+url)

or

page = requests.get("https://www.google.dz/search?q={}".format(url))

Upvotes: 1

Sushant
Sushant

Reputation: 3669

Use format

for url in lista:
    page = requests.get("https://www.google.dz/search?q={}".format(url))

Or

page = requests.get("https://www.google.dz/search?q=%s" % url)

Upvotes: 2

Related Questions