Reputation: 69
Let me explain:
I'm building a url parser in python(source code included at the bottom), and i'm trying to find search queries in a URL. Through observation, I discovered that "+" in search queries translate to " "s, but when I typed all characters on the keyboard, I noticed that there were other new characters like %21. Is there any specific encoding for a search query in google?
url parser source code:
def parseUrl(url):
if "?client" in url:
browser = url[url.index("?client")+8:url.index("&")]
print("[+] Found browser: "+browser)
idxPoint = url.index("&q=")+3
if "&sourceid" in url:
endSearch = url.index("&sourceid")
elif "&oq" in url:
endSearch = url.index("&oq")
else:
print("[!] Error: couldn't find &gs or &oq in your url.")
return
parseDict = {"+":" "}
searchQuery = url[idxPoint:endSearch]
for parseObj in parseDict:
searchQuery = searchQuery.replace(parseObj, parseDict[parseObj])
print("[+] Found search term: \"",searchQuery+"\"")
return searchQuery
Upvotes: 0
Views: 291
Reputation: 298532
Percent encoding is used when certain characters can't be literally inserted into a URL.
For example, ?
denotes the start of the query string and would make unambiguously parsing https://example.org/foo?bar?baz
impossible.
These special characters are encoded as a percent sign and the ASCII codepoint of the character in hex. For example:
In [4]: ord(' ')
Out[4]: 32
In [5]: hex(ord(' '))
Out[5]: '0x20'
Python already has a built-in library for parsing query strings:
from urllib.parse import parse_qs
def parseUrl(url):
params = parse_qs(url)
if 'client' in params:
browser = params['client'][0]
print('[+] Found browser:', browser)
query = params['q'][0]
print('[+] Found search term:', query)
return query
Upvotes: 1