NineWasps
NineWasps

Reputation: 2253

Pandas: replace values in dataframe

I have a dataframe df

ID  active_seconds  domain  subdomain   search_engine   search_term
0120bc30e78ba5582617a9f3d6dfd8ca    35  city-link.com  msk.city-link.com  None    None
0120bc30e78ba5582617a9f3d6dfd8ca    54  vk.com  vk.com  None    None
0120bc30e78ba5582617a9f3d6dfd8ca    34  mts.ru  shop.mts.ru  None    None
16c28c057720ab9fbbb5ee53357eadb7    4   facebook.com    facebook.com    None    None

and have a list url = ['city-link.com', 'shop.mts.ru']. I need to change column with subdomain. If subdomain is equal one of elem from url, leave it. If subdomain != elem from url and domain == elem from url I should rewrite subdomain(write domain to it). And if subdomain no in list no change. How can I do it with pandas? I try to do it with loop but it spent a lot of time

domains = df['domain']
subdomains = df['subdomain']
urls = ['yandex.ru', 'vk.com', 'mail.ru']
for (domain, subdomain) in zip(domains, subdomains):
    if subdomain in urls:
        continue
    elif domain in urls and subdomain not in urls:
        df['subdomain'].replace(subdomain, domain, inplace=True)

Upvotes: 0

Views: 246

Answers (1)

frist
frist

Reputation: 1958

First, you need to get records where domain field in urls list:

domains_in_urls = df[df.domain.isin(urls)]

Next, you have to take these records and find out records where subdomain field are not in urls:

subdomains_not_in_urls = domains_in_urls[~domains_in_urls.subdomain.isin(urls)]

And replace subdomain field with the domain field for those indexes in original dataframe:

df.loc[subdomains_not_in_urls.index, 'subdomain'] = \
        df.loc[subdomains_not_in_urls.index, 'domain']

Upvotes: 2

Related Questions