Reputation: 8793
I'm doing the first example of the webscraping tutorial from the book "Automate the Boring Tasks with Python". The project consists of typing a search term on the command line and have my computer automatically open a browser with all the top search results in new tabs
It mentions that I need to locate the
<h3 class="r">
element from the page source, which are the links to each search results. The r class is used only for search result links.
But the problem is that I can't find it anywhere, even using Chrome Devtools. Any help as to where is it would be greatly appreciated.
Note: Just for reference this is the complete program as seen on the book.
# lucky.py - Opens several Google search results.
import requests, sys, webbrowser, bs4
print('Googling..') # display text while downloading the Google page
res= requests.get('http://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
#Retrieve top searh result links.
soup = bs4.BeautifulSoup(res.text)
#Open a browser tab for each result.
linkElems = soup.select('.r a')
numOpen = min(5,len(linkElems))
for i in range(numOpen):
webbrowser.open('http://google.com' + linkElems[i].get('href'))
Upvotes: 2
Views: 487
Reputation: 905
This will work for you :
>>> import requests
>>> from lxml import html
>>> r = requests.get("https://www.google.co.uk/search?q=how+to+do+web+scraping&num=10")
>>> source = html.fromstring((r.text).encode('utf-8'))
>>> links = source.xpath('//h3[@class="r"]//a//@href')
>>> for link in links:
print link.replace("/url?q=","").split("&sa=")[0]
Output :
http://newcoder.io/scrape/intro/
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
http://docs.python-guide.org/en/latest/scenarios/scrape/
http://webscraper.io/
https://blog.hartleybrody.com/web-scraping/
https://first-web-scraper.readthedocs.io/
https://www.youtube.com/watch%3Fv%3DE7wB__M9fdw
http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/
http://analystcave.com/web-scraping-tutorial/
https://en.wikipedia.org/wiki/Web_scraping
Note: I am using Python 2.7.X , for Python 3.X you just have to surround the print output like this print (link.replace("/url?q=","").split("&sa=")[0])
Upvotes: 2