HGH
HGH

Reputation: 49

Problem with webscraping google python beautiful soup

i am writing code: i want to open some subpages which have been found.

import bs4
import requests

url = 'https://www.google.com/search?q=python'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
list_sites = soup.select('a[href]')
print(len(list_sites))

i want to open for example site in google like 'python' and then open some first links, but i have a problem with function select. What i should put inside to find links to subpage? like a: Polish Python Coders Group - News, Welcome to Python.org, ... I tried to put: a[href], a, h3 class but it doesnt work...

Upvotes: 1

Views: 108

Answers (1)

LucasBorges-Santos
LucasBorges-Santos

Reputation: 392

is this you need?

from bs4 import BeautifulSoup
import requests, urllib.parse
import lxml

def print_extracted_data_from_url(url):

    headers = {
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    response = requests.get(url, headers=headers).text
    soup = BeautifulSoup(response, 'lxml')


    for container in soup.findAll('div', class_='tF2Cxc'):
        head_link = container.a['href']
        print(head_link)

    return soup.select_one('a#pnnext')



next_page_node = print_extracted_data_from_url('https://www.google.com/search?hl=en-US&q=python')

Upvotes: 0

Related Questions