Kshitij Yadav
Kshitij Yadav

Reputation: 1387

Removing dot (".") from the string using python

I have a data which looks like this:

I have written a script which looks like this:

data['website']=data['Website address'].str.split('www.').str[1]
data['website']=data['website'].str.split('.com').str[0]

This basically first removes the "www" and then the second code was intended to remove the ".com" from the string. The result I should be getting for the 1st and 2nd data point should be:

But instead I am getting is "r". So i think Python is not interpreting "." as dot, but any character before "com".

I would like to know how to remove phrases such as ".ru" , ".com", ".it" etc. Kindly help.

Upvotes: 1

Views: 3890

Answers (3)

allmtn
allmtn

Reputation: 1

You can try this:

yourstring.translate({ord('.'):None})

Upvotes: 0

andole
andole

Reputation: 286

import re


def get_domain(s):
    return re.sub("^www\.(.+)\.[^\.]+$", "\\1", s)

print(get_domain("www.r-computer.com"))   # r-computer


(untested) Return both sitename and .com .org etc. Return None if there is no match

import re


def get_domain(s):
    ret = re.findall("^www\.(.+)\.([^\.]+)$", s)
    return ret[0] if ret else (None, None)


# example
a, b = get_domain("www.italy.it")

if a and b:
    print(a)  # italy
    print(b)  # it

Upvotes: 2

Simon Crane
Simon Crane

Reputation: 2182

For the examples provided, this will work:

data['website']=data['website'].split('.')[1]

What this does is get the text between the first and second '.'

Upvotes: 0

Related Questions