Reputation: 355
I'm looking for a way to edit my string. My string is like this http://www.example.com/example:8080
now what i want to do is find the third occurrence of "/" and then edit the string to http://www.example.com:8080
so basically remove what ever is between third occurrence of "/" and second occurrence of ":". I tried writing a regular expression and was able to get to the first part it looks like this ((.*?/){3}(.*))
but how to get through the second task and get the final string?
Thanks
EDIT :
The number of times the "/" occurs is not a concern guys. It can even be http://www.example.com/example/index.php:8080
What i want is from the third occurrence of "/" to the second occurrence of ":" the content should be removed or deleted and we finally should have a string as http://www.example.com:8080
Upvotes: 2
Views: 154
Reputation: 17971
Since you haven't accepted an answer, you might be stuck, Here is an example that will do the trick explained by other answers.
from urllib2 import urlparse
url = 'http://www.example.com/example:8080'
parsedURL = urlparse.urlparse(url)
port = url.split(':')[2]
fixedURL = parsedURL.scheme + '://' + parsedURL.netloc + ':' + port
The first line accepts the url and parses it
The second line reformats it by cutting out everything after the /
and before the :
This will only work if your port is on the end and there are only 2 :
s
Upvotes: 0
Reputation: 40688
I have two solutions: use the urlparse
module (preferred) and regular expression.
import urlparse
import re
# METHOD 1: use urlparse
# Parse the incorrect URL
incorrect_url = 'http://www.example.com/example:8080'
scheme, netloc, path, params, query, fragment = urlparse.urlparse(incorrect_url)
# Fix up
path, port = path.split(':')
netloc = netloc + ':' + port
path = ''
# Putting them all together
correct_url = urlparse.urlunparse((scheme, netloc, path, params, query, fragment))
print correct_url
# METHOD 2: use regular expression
scheme, dummy1, dummy2, netloc, path, port=re.split(r'[/:]', incorrect_url)
correct_url = '{}://{}:{}'.format(scheme, netloc, port)
print correct_url
In general, when dealing with URLs, I prefer the right tool: urlparse. The regular expression solution has the advantage of being shorter, but might get you into trouble for some corner cases.
Upvotes: 0
Reputation: 14929
Not an exact answer to the question but might solve the problem. If that's how the url is always, you could use the urlparse
module from urllib2
.
In [9]: from urllib2 import urlparse
In [10]: parsed_url = urlparse.urlparse('http://www.example.com/example:8080')
In [11]: parsed_url
Out[11]: ParseResult(scheme='http', netloc='www.example.com', path='/example:8080', params='', query='', fragment='')
In [12]: parsed_url.path
Out[12]: '/example:8080'
In [13]: parsed_url.path.split(':')
Out[13]: ['/example', '8080']
Rest you can do I think.
Upvotes: 1
Reputation: 421
A simple but ugly way would be:
>>> x = 'http://www.example.com/example:8080'
>>> x.find('/',x.find('/',x.find('/')+1)+1)
22
>>> x.rfind(':')
30
>>> x[:22] + x[30:]
'http://www.example.com:8080'
Note that rfind()
searches backwards. Beware this might go wrong if your URL doesn't look as it you expect it to. The x[:22]
and x[:30]
parts are examples of slicing, a useful feature of python. For more information, you could read the tutorial for strings in python.
Upvotes: 2