Reputation: 162
I'm trying add Google URL Builder's functionality into my application.
https://support.google.com/analytics/answer/1033867?hl=en
Unfortunately, I'm not sure to get the exact results..
My code
def buildurl(url):
#take out old url builder
url = sub('\?utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)','',url)
#build url
header += '?utm_source=' + self.data['source']
header += '&utm_medium=' + self.data['medium']
header += '&utm_campaign=' + self.data['campaign']
#return long url
return(url header + urllib.quote(header)
My code returns this: http://iipdigital.usembassy.gov/st/english/article/2014/08/20140813305633.html#axzz3ANwb5XD?utm_source=source&utm_medium=medi&utm_campaign=testu
Google's URL Builder Returns this: http://iipdigital.usembassy.gov/st/english/article/2014/08/20140813305633.html?utm_source=source&utm_medium=medi&utm_campaign=test#axzz3ANwb5XDu
I could push the #axzz3ANwb5XDu to the back, but is there a way to parse and reconstruct the url in a standardized way?
Upvotes: 1
Views: 1370
Reputation: 4017
I would go for Pythons urllib - it's a build in library.
import urllib.parse
getVars = {'var1': 'some_data', 'var2': 1337}
url = 'http://domain.com/somepage/?'
print(url + urllib.parse.urlencode(getVars))
Output:
http://domain.com/somepage/?var2=1337&var1=some_data
Upvotes: 1
Reputation: 287835
There is a way to parse the URL; it's called urlparse
:
try:
from urllib.parse import urlparse, urlunparse
except ImportError: # Python 2.x
from urlparse import urlparse, urlunparse
def buildurl(url):
scheme, netloc, path, params, query, fragment = urlparse(url)
#take out old url builder
query = sub('\?utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)', '', query)
#build url
query += '?utm_source=' + self.data['source']
query += '&utm_medium=' + self.data['medium']
query += '&utm_campaign=' + self.data['campaign']
return urlunparse((scheme, netloc, path, params, query, fragment))
Upvotes: 0
Reputation: 5168
You should checkout the urlparse module. I have modified your code such that it removes the existing url builder parts but keeps any other parts of the query.
from urlparse import urlparse, urlunparse
def buildurl(url):
#take out old url builder.
url = sub('utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)','',url)
#Parse the url.
o = urlparse(url)
#build url query.
query = o.query
query += 'utm_source=' + self.data['source']
query += '&utm_medium=' + self.data['medium']
query += '&utm_campaign=' + self.data['campaign']
#return the url with the corrected query.
return urlunparse(o.scheme, o.netloc, o.path, o.params, query, o.fragment)
The fragment should be at the end of the url.
Upvotes: 1