Sid
Sid

Reputation: 2189

SyntaxError while scraping Google with BeautifulSoup

I am scraping google search results. However, I repeatedly get a SyntaxError while doing it. Here's the code:

import urllib.request
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/70.0'

url = "https://www.google.com/search?hl=en&q=python+wikipedia"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read()

soup= BeautifulSoup(data, 'html.parser')
l = soup.find_all('h' , 'attrs' = {"class":'LC20lb'})
print(l)

I get :

SyntaxError: keyword can't be an expression

in the line l = soup.find_all('h' , 'attrs' = {"class":'LC20lb'}). Can someone please tell me what I'm doing wrong?

Upvotes: 1

Views: 62

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

Try to use requests instead.

Try to use css selectors, e.g select()/select_one(), they're more flexible and a bit more readable and a bit faster.

soup.select('.LC20lb') # equivalent to find_all()

Check out the SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired element in the browser.

Also, you don't have to specify the class attribute in find_all(), e.g:

soup.find_all('h3', 'LC20lb') # returs a list of titles

Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "python wikipedia"
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')

# container with all titles 
for result in soup.select('.tF2Cxc'):
  # extracting each title from the container specifying what css selector title has
  title = result.select_one('.DKV0Md').text
  print(title)

-----
'''
Python (programming language) - Wikipedia
Python - Wikipedia
History of Python - Wikipedia
wikipedia 1.4.0 - PyPI
What is Python? Executive Summary
Python Wiki: FrontPage
BeginnersGuide/Programmers - Python Wiki
Wikipedia API for Python. In this tutorial let us understand the…
Wikipedia — wikipedia 0.9 documentation
'''

Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you only need to iterate over structured JSON and get what you want rather than figuring out how to parse stuff.

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "python Wikipedia",
    "hl": "en",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
  title = result['title']
  print(title)

------
'''
Python - Wikipedia
History of Python - Wikipedia
wikipedia 1.4.0 - PyPI
What is Python? Executive Summary
Python Wiki: FrontPage
BeginnersGuide/Programmers - Python Wiki
Wikipedia API for Python. In this tutorial let us understand the…
Wikipedia — wikipedia 0.9 documentation
'''

Disclaimer, I work for SerpApi.

Upvotes: 0

krishna
krishna

Reputation: 1029

import urllib.request
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/70.0'

url = "https://www.google.com/search?hl=en&q=python+wikipedia"
headers={'User-Agent':user_agent,}

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read()

soup= BeautifulSoup(data, 'html.parser')
l = soup.find_all('h',  {"class":'LC20lb'})
print(l)

Upvotes: 1

Petr Blahos
Petr Blahos

Reputation: 2433

There should not be the apostrophes around attrs:

l = soup.find_all('h' ,   attrs  = {"class":'LC20lb'})
# not:                   _     _
#l = soup.find_all('h' , 'attrs' = {"class":'LC20lb'})    
#                        ^     ^

Upvotes: 1

Related Questions