Priniotis
Priniotis

Reputation: 45

getting an empty list when trying to extract urls from google with beautifulsoup

I am trying to extract the first 100 urls that return from a location search in google however i am getting an empty list every time ("no results found")

import requests
from bs4 import BeautifulSoup

def get_location_info(location):
    query = location + " information"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
    }
    url = "https://www.google.com/search?q=" + query
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    results = soup.find_all("div", class_="r")
    websites = []
    if results:
        counter = 0
        for result in results:
            websites.append(result.find("a")["href"])
            counter += 1
            if counter == 100:
                break
    else:
        print("No search results found.")
    return websites

location = "Athens"
print(get_location_info(location))

No search results found. []

I have also tried this approach :

import requests
from bs4 import BeautifulSoup

def get_location_info(location):
    query = location + " information"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
     }
    url = "https://www.google.com/search?q=" + query
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    results = soup.find_all("div", class_="r")
    websites = [result.find("a")["href"] for result in results][:10]
    return websites

location = "sifnos"
print(get_location_info(location))`

and i get an empty list. I think i am doing everything suggested in similar posts but i still get nothing

Upvotes: 0

Views: 40

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

Always and first of all, take a look at your soup to see if all the expected ingredients are in place.


Select your elements more specific in this case for example with css selector:

[a.get('href') for a in soup.select('a:has(>h3)')]

To void consent banner also send some cookies:

cookies={'CONSENT':'YES+'}

Example

import requests
from bs4 import BeautifulSoup

def get_location_info(location):
    query = location + " information"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
     }
    url = "https://www.google.com/search?q=" + query
    response = requests.get(url, headers=headers, cookies={'CONSENT':'YES+'})
    soup = BeautifulSoup(response.text, 'html.parser')
    websites = [a.get('href') for a in soup.select('a:has(>h3)')]
    return websites

location = "sifnos"
print(get_location_info(location))

Output

['https://www.griechenland.de/sifnos/', 'http://de.sifnos-greece.com/plan-trip-to-sifnos/travel-information.php', 'https://www.sifnosisland.gr/', 'https://www.visitgreece.gr/islands/cyclades/sifnos/', 'http://www.griechenland-insel.de/Hauptseiten/sifnos.htm', 'https://worldonabudget.de/sifnos-griechenland/', 'https://goodmorningworld.de/sifnos-griechenland/', 'https://de.wikipedia.org/wiki/Sifnos', 'https://sifnos.gr/en/sifnos/', 'https://www.discovergreece.com/de/cyclades/sifnos']

Upvotes: 1

Related Questions