orgen
orgen

Reputation: 180

Add column to Pandas DataFrame created by function?

There is a csv file with following urls inside:

1;https://www.one.de 
2;https://www.two.de 
3;https://www.three.de
4;https://www.four.de
5;https://www.five.de

Then I load it to a pandas dataframe df.

cols = ['nr','url']
df = pd.read_csv("listing.csv", sep=';', encoding = "utf8", dtype=str, names=cols)

Then I like to add another col 'domain_name' corresponding to the nr.

def takedn(url):
    m = urlsplit(url)
    return m.netloc.split('.')[-2]

df['domain_name'] = takedn(df['url'].all())
print(df.head())

But it takes the last domain_name for all nr's.

Output:
  nr                   url domain_name
0  1    https://www.one.de        five
1  2    https://www.two.de        five
2  3  https://www.three.de        five
3  4   https://www.four.de        five
4  5   https://www.five.de        five

I try this to learn vectorizing. It will not work as I think. First line the domain_name should be one, second two and so on.

Upvotes: 0

Views: 35

Answers (2)

BENY
BENY

Reputation: 323226

We have built-in function in tldextract

import tldextract
df['domain'] = df.url.map(lambda x : tldextract.extract(x).domain)
df
   nr                   url domain_name domain
0   1    https://www.one.de        five    one
1   2    https://www.two.de        five    two
2   3  https://www.three.de        five  three
3   4   https://www.four.de        five   four
4   5   https://www.five.de        five   five

Upvotes: 1

Ynjxsjmh
Ynjxsjmh

Reputation: 30022

To operate on element, you can use apply().

def takedn(url):
    m = urlsplit(url)
    return m.netloc.split('.')[-2]

df['domain_name'] = df['url'].apply(takedn)

Upvotes: 1

Related Questions