Reputation: 107
What is the preferred way to cut off random characters at the end of a string in Python?
I am trying to simplify a list of URLs to do some analysis and therefore need to cut-off everything that comes after the file extension .php
Since the characters that follow after .php
are different for each URL using strip() doesn't work. I thought about regex and substring()
. But what would be the most efficient way to solve this task?
Example:
Let's say I have the following URLs:
example.com/index.php?random_var=random-19wdwka
example.org/index.php?another_var=random-2js9m2msl
And I want the output to be:
example.com/index.php
example.org/index.php
Thanks for your advice!
Upvotes: 1
Views: 362
Reputation: 31270
It seems like what you really want are to strip away the parameters of the URL, you can also use
from urlparse import urlparse, urlunparse
urlunparse(urlparse(url)[:3] + ('', '', ''))
to replace the params, query and fragment parts of the URL with empty strings and generate a new one.
Upvotes: 0
Reputation: 2159
Split on your separator at most once, and take the first piece:
text="example.com/index.php?random_var=random-19wdwka"
sep="php"
rest = text.split(sep)[0]+".php"
print rest
Upvotes: 0
Reputation: 995
There are two ways to accomplish what you want.
In your example, if You know that the string ends with .php?
then all you need to do is:
my_string.split('?')[0]
In this case you can use urlparse and take everything but the parameters.
from urlparse import urlparse
for url is urls:
p = urlparse(url)
print p.scheme + p.netloc + p.path
Upvotes: 1