Reputation: 1
The problem lies somewhere in how I'm parsing and or reassembling urls. I'm losing the ?id=1
and getting ?d=1
.
What I am trying to do is have the ability to manipulate and query parameter and reassemble it before sending back out modified. Meaning the dictionaries would be modified than using urlencode(modified_dict)
I would reassemble url + query.
Can someone give me a pointer on what I'm doing wrong here.
from urlparse import parse_qs, urlparse , urlsplit
from urllib import urlencode
import os
import sys
import mechanize
from collections import OrderedDict
import urllib2
scrape_post_urls = []
get_inj_tests = []
#check multiple values to strip out duplicate and useless checks
def parse_url(url):
parsed = urlparse(url,allow_fragments=False)
if parsed.query:
if url not in get_inj_tests:
get_inj_tests.append(url)
#print url
'''get_inj_tests.append(url)
print url
#print 'scheme :', parsed.scheme
#print 'netloc :', parsed.netloc
print 'path :', parsed.path
print 'params :', parsed.params
print 'query :', parsed.query
print 'fragment:', parsed.fragment
#print 'hostname:', parsed.hostname, '(netloc in lower case)'
#print 'port :', parsed.port
'''
else:
if url not in scrape_post_urls:
scrape_post_urls.append(url)
#print url
def main():
unparsed_urls = open('in.txt','r')
for urls in unparsed_urls:
try:
parse_url(urls)
except:
pass
print(len(scrape_post_urls))
print(len(get_inj_tests))
clean_list = list(OrderedDict.fromkeys(get_inj_tests))
reaasembled_url = ""
#print clean_list
for query_test in clean_list:
url_object = urlparse(query_test,allow_fragments=False)
#parse query paramaters
url = query_test.split("?")[1]
dicty = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}
query_pairs = [(k,v) for k,vlist in dicty.iteritems() for v in vlist]
reaasembled_url = "http://" + str(url_object.netloc) + str(url_object.path) + '?'
reaasembled_query = urlencode(query_pairs)
full_url = reaasembled_url + reaasembled_query
print dicty
main()
Upvotes: 0
Views: 2058
Reputation: 77912
Can someone give me a pointer on what I'm doing wrong here.
Well quite simply you're not using the existing tools:
1/ to parse a query string, use urllib.parse.parse_qsl()
.
2/ to reassemble the querystring, use urllib.parse.urlencode()
.
And forget about dicts, querystrings can have multiple values for the same key, ie ?foo=1&foo=2
is perfectly valid.
Upvotes: 2
Reputation: 454
first of all, your variable url
is a bad name for the params variable and this could create confusion.
>>> url = "https://url.domian.com?id=22¶m1=1¶m2=2".split("?")[1]
'id=22¶m1=1¶m2=2'
>>> "https://url.domian.com?id=22¶m1=1¶m2=2".split("?")[1].split("&")
['id=22', 'param1=1', 'param2=2']
The error is in the url[1:].split("&")
Solution:
>>> dicty = {x[0] : x[1] for x in [x.split("=") for x in url.split("&") ]}
{'id': '22', 'param1': '1', 'param2': '2'}
Upvotes: 0