Rohan A
Rohan A

Reputation: 97

Parse what you google search

I'd like to write a script (preferably in python, but other languages is not a problem), that can parse what you type into a google search. Suppose I search 'cats', then I'd like to be able to parse the string cats and, for example, append it to a .txt file on my computer.

So if my searches were 'cats', 'dogs', 'cows' then I could have a .txt file like so,

cats dogs cows

Anyone know any APIs that can parse the search bar and return the string inputted? Or some object that I can cast into a string?

EDIT: I don't want to make a chrome extension or anything, but preferably a python (or bash or ruby) script I can run in terminal that can do this.

Thanks

Upvotes: 0

Views: 286

Answers (3)

Lawrence Lin Murata
Lawrence Lin Murata

Reputation: 61

A few options you might consider, with their advantages and disadvantages:

  • URL:

    • advantage: as Chris mentioned, accessing the URL and manually changing it is an option. It should be easy to write a script for this, and I can send you my perl script if you want

    • disadvantage: I am not sure if you can do it. I made a perl script for that before, but it didn't work because Google states that you can't use its services outside the Google interface. You might face the same problem

  • Google's search API:

    • advantage: popular choice. Good documentation. It should be a safe choice

    • disadvantage: Google's restrictions.

  • Research other search engines:

    • advantage: they might not have the same restrictions as Google. You might find some search engines that let you play around more and have more freedom in general.

    • disadvantage: you're not going to get results that are as good as Google's

Upvotes: 0

Dmitry Dubovitsky
Dmitry Dubovitsky

Reputation: 2236

I can offer 2 popular solution 1) Google have a search-engine API https://developers.google.com/products/#google-search (It have restriction on 100 requests per day)

cutted code:

def gapi_parser(args):
    query = args.text; count = args.max_sites
    import config
    api_key = config.api_key 
    cx = config.cx 

    #Note: This API returns up to the first 100 results only. 
    #https://developers.google.com/custom-search/v1/using_rest?hl=ru-RU#WorkingResults

    results = []; domains = set(); errors = []; start = 1
    while True:
        req = 'https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={q}&alt=json&start={start}'.format(key=api_key, cx=cx, q=query, start=start)
        if start>=100: #google API does not can do more
            break
        con = urllib2.urlopen(req) 
        if con.getcode()==200:
            data = con.read()
            j = json.loads(data)
            start = int(j['queries']['nextPage'][0]['startIndex'])
            for item in j['items']:
                match = re.search('^(https?://)?\w(\w|\.|-)+', item['link'])
                if match: 
                    domain = match.group(0)
                    if domain not in results:
                        results.append(domain)
                    domains.update([domain])
                else:
                    errors.append('Can`t recognize domain: %s' % item['link'])
            if len(domains) >= args.max_sites:
                 break 

    print
    for error in errors:
        print error
return (results, domains)

2) I wrote a selenuim based script what parse a page in real browser instance, but this solution have a some restrictions, for example captcha if you run searches like a robots.

Upvotes: 1

Chris Barker
Chris Barker

Reputation: 2389

If you have access to the URL, you can look for "&q=" to find the search term. (http://google.com/...&q=cats..., for example).

Upvotes: 1

Related Questions