Sipher_
Sipher_

Reputation: 69

Is there a specific encoding that google uses to encode its search queries?

Let me explain:

I'm building a url parser in python(source code included at the bottom), and i'm trying to find search queries in a URL. Through observation, I discovered that "+" in search queries translate to " "s, but when I typed all characters on the keyboard, I noticed that there were other new characters like %21. Is there any specific encoding for a search query in google?

url parser source code:

def parseUrl(url):
    if "?client" in url:
        browser = url[url.index("?client")+8:url.index("&")]
        print("[+] Found browser: "+browser)
    idxPoint = url.index("&q=")+3
    if "&sourceid" in url:
        endSearch = url.index("&sourceid")
    elif "&oq" in url:
        endSearch = url.index("&oq")
    else:
        print("[!] Error: couldn't find &gs or &oq in your url.")
        return
    parseDict = {"+":" "}
    searchQuery = url[idxPoint:endSearch]
    for parseObj in parseDict:
        searchQuery = searchQuery.replace(parseObj, parseDict[parseObj])
    print("[+] Found search term: \"",searchQuery+"\"")
    return searchQuery

Upvotes: 0

Views: 291

Answers (1)

Blender
Blender

Reputation: 298532

Percent encoding is used when certain characters can't be literally inserted into a URL. For example, ? denotes the start of the query string and would make unambiguously parsing https://example.org/foo?bar?baz impossible. These special characters are encoded as a percent sign and the ASCII codepoint of the character in hex. For example:

In [4]: ord(' ')
Out[4]: 32

In [5]: hex(ord(' '))
Out[5]: '0x20'

Python already has a built-in library for parsing query strings:

from urllib.parse import parse_qs

def parseUrl(url):
    params = parse_qs(url)

    if 'client' in params:
        browser = params['client'][0]
        print('[+] Found browser:', browser)

    query = params['q'][0]
    print('[+] Found search term:', query)

    return query

Upvotes: 1

Related Questions