Reputation:
I'm trying to make a script that will scrape the first link of a google search so that it will give me back only the first link so I can run a search in the terminal and look at the link later on with the search term. I'm struggling to only get the first result. This is the closest thing I've got so far.
import requests
from bs4 import BeautifulSoup
research_later = "hiya"
goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + research_later
r = requests.get(goog_search)
soup = BeautifulSoup(r.text)
for link in soup.find_all('a'):
print research_later + " :"+link.get('href')
Upvotes: 12
Views: 12028
Reputation: 1724
You can use either select_one()
for selecting CSS
selectors or find()
bs4
methods to get only one element from the page. To grab CSS
selectors you can use SelectorGadget extension.
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, json
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.com/search?q=ice cream', headers=headers).text
soup = BeautifulSoup(html, 'lxml')
# locating .tF2Cxc class
# calling for <a> tag and then calling for 'href' attribute
link = soup.select('.yuRUbf a')['href']
print(link)
# https://en.wikipedia.org/wiki/Ice_cream
Alternatively, you can do the same thing by using Google Search Engine Results API from SerpApi. It's a paid API with a free plan.
The main difference is that everything (selecting, bypass blocks, proxy rotation, and more) is already done for the end-user with a JSON output.
Code to integrate:
params = {
"engine": "google",
"q": "ice cream",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
# [0] - first index from the search results
link = results['organic_results'][0]['link']
print(link)
# https://en.wikipedia.org/wiki/Ice_cream
Disclaimer, I work for SerpApi.
Upvotes: 1
Reputation: 22292
Seems like Google use cite
tag to save the link, so we can just use soup.find('cite').text
like this:
import requests
from bs4 import BeautifulSoup
research_later = "hiya"
goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + research_later
r = requests.get(goog_search)
soup = BeautifulSoup(r.text, "html.parser")
print soup.find('cite').text
Output is:
www.urbandictionary.com/define.php?term=hiya
Upvotes: 10