Rajesh Majumdar
Rajesh Majumdar

Reputation: 9

How to remove .com from an url in python?

I want to remove the domain in an url For e.g. User entered www.google.com But I only need www.google

How to do this in python? Thanks

Upvotes: 0

Views: 3422

Answers (4)

AdrienW
AdrienW

Reputation: 3452

If you want to remove 4 characters at the end, slice it

url = 'www.google.com'
cut_url = str[:-4]
# output : 'www.google'

More advanced answer

If you have a list of all the possible domains domains:

domains = ['com', 'uk', 'fr', 'net', 'co', 'nz']  # and so on...
while True:
    domain = url.split('.')[-1]
    if domain in domains:
        url = '.'.join(url.split('.')[:-1])
    else:
        break

Or if, for example, you have a domains list where .co and .uk are not separated:

domains = ['.com', '.co.uk', '.fr', '.net', '.co.nz']  # and so on...
for domain in domains:
    if url.endswith(domain):
        cut_url = url[:-len(domain)]
        break
else:  # there is no indentation mistake here.
       # else after for will be executed if for did not break
    print('no known domain found')

Upvotes: 0

i333
i333

Reputation: 13

What you need here is rstrip function.

Try this code:

url = 'www.google.com'
url2 = 'www.google'

new_url = url.rstrip('.com')
print (new_url)

new_url2 = url2.rstrip('.com')
print (new_url2)

rstrip will only strip if the string is present, in this case ".com". If not, it will just leave it. rstrip is for stripping 'right-most' matched string and lstrip is the opposite of this. Check these docs. Also check strip and lstrip functions.

UPDATE

As @SteveJessop pointed out that the above example is NOT the right solution so i'm submitting another solution, though it's related to another answer here, it does check first if the string ends with a '.com'.

url = 'www.foo.com'
if url.endswith('.com'):
    url = url[:-4]
    print (url)

Upvotes: -1

Learner
Learner

Reputation: 639

To solve this without having the problem of dealing with domain name, you can look for the dots from left hand side and stop at the second dot.

t = 'www.google.com'
a = t.split('.')[1]
pos = t.find(a)
t = t[:pos+len(a)]

>>> 'www.google'

Upvotes: 2

holdenweb
holdenweb

Reputation: 37033

This is a very general question. But the narrowest answer would be as follows (assuming url holds the URL in question):

if url.endswith(".com"):
    url = url[:-4]

If you want to remove the last period and everything to the right of it the code would be a little more complicated:

pos = url.rfind('.') # find rightmost dot
if pos >= 0:         # found one
    url = url[:pos]

Upvotes: 3

Related Questions