Petr Petrov
Petr Petrov

Reputation: 4452

Python and regex: create a template

I need to find a lot of substrings in string but It takes a lot of time, so I need to combine it in pattern:

I should find string

003.ru/%[KEYWORD]%
1click.ru/%[KEYWORD]%
3dnews.ru/%[KEYWORD]%

where % - is an any symbols and [KEYWORD] - can be ['sony%xperia', 'iphone', 'samsung%galaxy', 'lenovo_a706']

I try to do a search with

keywords = ['sony%xperia', 'iphone', 'samsung%galaxy', 'lenovo_a706']
for i, key in enumerate(keywords):
    coding['keyword_url'] = coding.url.apply(lambda x: x.replace('[KEYWORD]', key).replace('%', '[a-zA-Z0-9-_\.\?!@#$%^&*+=]+') if '[KEYWORD]' in x else x.replace('%', '[a-zA-Z0-9-_\.\?!@#$%^&*+=]+'))
    for (domain, keyword_url) in zip(coding.domain.values.tolist(), coding.keyword_url.values.tolist()):
        df.loc[df.event_address.str.contains(keyword_url), 'domain'] = domain

Where df contains only event_address (urls)

coding

domain  url
003.ru  003.ru/%[KEYWORD]%
1CLICK  1click.ru/%[KEYWORD]%
33033.ru    33033.ru/%[KEYWORD]%
3D NEWS 3dnews.ru/%[KEYWORD]%
96telefonov.ru  96telefonov.ru/%[KEYWORD]%

How can I improve my pattern to do it faster?

Upvotes: 0

Views: 290

Answers (1)

Vyko
Vyko

Reputation: 212

First, you should consider using re module. Look at the re.compile function for your patterns and then you can match them.

Upvotes: 1

Related Questions