Rachel
Rachel

Reputation: 1967

Searching using google and python and storing the first link?

I am trying to make use of google search and get the first URL from the search results. I tried to make use of the google custom search api. But it seems over the top for such a simple task. Hence, I am trying to use this interesting package I found: https://pypi.python.org/pypi/google

This is what I came up with so far

from google import search
url = search('my search entry', stop=1)
for result in url:
    print(url)

It seems that search() returns several generator objects. This is my return:

<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>
<generator object search at 0x10e230048>

However, I want the first external url/link. Is there a way to do that? I tried list() - but the generator seems empty.

Upvotes: 0

Views: 2213

Answers (3)

Denis Skopa
Denis Skopa

Reputation: 99

As an alternative solution, you can use BeautifulSoup web scraping library if you don't want to use either Google API or browser automation such as selenium which slows the scraping process a bunch.

Check code in online IDE

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

# this URL params is taken from the actual Google search URL
# and transformed to a more readable format
params = {
  "q": "music",                               # query
  "gl": "us",                                 # country to search from
  "hl": "en",                                 # language
}

html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

website_link = soup.select_one(".yuRUbf a")["href"]
print(website_link)

Example output

https://music.youtube.com/

More info about what CSS selectors are, and the cons of using CSS selectors.

Upvotes: 0

mikep
mikep

Reputation: 3905

From http://pythonhosted.org/google/ , the signature of search is

generator search(query, tld='com', lang='en', num=10, start=0, stop=None, pause=2.0)

Try setting num = 1 and stop = 0.

Upvotes: 1

demouser123
demouser123

Reputation: 4264

You can use Selenium as mentioned by gabriel belini. Here is the code that I wrote just a while now for this

  from selenium import webdriver
  import time
  chrome_path ="/usr/local/lib/python3.5/site-packages/selenium/chromedriver"

  driver =webdriver.Chrome(chrome_path)

  driver.get('https://google.com')


  driver.find_element_by_css_selector('input#lst-ib.gsfi').send_keys('Music')

 time.sleep(5)

 driver.find_element_by_name('btnG').click()

 time.sleep(3)

 element1 = driver.find_element_by_xpath("//*[@id='rso']/div[1]/div/div[1]/div/div/div/div/div[1]/cite")

 print(element1.text)

which outputs -> https://www.youtube.com/channel/UC-9-kyTW8ZkZNDHQJ6FgpwQ

If I search for Music keyword in search box, the first result returned is of Youtube - you can see this here

enter image description here

You can use pip to install Selenium as

  pip install -U Selenium

and download chromedriver from here. This chrome_path in above script is the path where you will keep your chromedriver executable.

Upvotes: 1

Related Questions