Reputation: 2253
I have a dataframe df
ID active_seconds domain subdomain search_engine search_term
0120bc30e78ba5582617a9f3d6dfd8ca 35 city-link.com msk.city-link.com None None
0120bc30e78ba5582617a9f3d6dfd8ca 54 vk.com vk.com None None
0120bc30e78ba5582617a9f3d6dfd8ca 34 mts.ru shop.mts.ru None None
16c28c057720ab9fbbb5ee53357eadb7 4 facebook.com facebook.com None None
and have a list url = ['city-link.com', 'shop.mts.ru']
.
I need to change column with subdomain
. If subdomain is equal one of elem from url
, leave it. If subdomain != elem from url
and domain == elem from url
I should rewrite subdomain(write domain to it). And if subdomain
no in list no change.
How can I do it with pandas?
I try to do it with loop but it spent a lot of time
domains = df['domain']
subdomains = df['subdomain']
urls = ['yandex.ru', 'vk.com', 'mail.ru']
for (domain, subdomain) in zip(domains, subdomains):
if subdomain in urls:
continue
elif domain in urls and subdomain not in urls:
df['subdomain'].replace(subdomain, domain, inplace=True)
Upvotes: 0
Views: 246
Reputation: 1958
First, you need to get records where domain field in urls list:
domains_in_urls = df[df.domain.isin(urls)]
Next, you have to take these records and find out records where subdomain field are not in urls:
subdomains_not_in_urls = domains_in_urls[~domains_in_urls.subdomain.isin(urls)]
And replace subdomain field with the domain field for those indexes in original dataframe:
df.loc[subdomains_not_in_urls.index, 'subdomain'] = \
df.loc[subdomains_not_in_urls.index, 'domain']
Upvotes: 2