Erick
Erick

Reputation: 49

Match a pattern between the www. and .com parts of a web address

I am writing a program which gives me the letters containing only the consonants in a webpage address between www. and .com. For example if I input www.google.com it should return me 'ggl' but that doesnt happen.

import re

x=int(raw_input())

for i in range(x):
    inp1=raw_input()
    y=re.findall('^www\.[^(aeiou)]+\.com',inp1)
    print y
    inp2=y[0]
    print inp2

So what's the mistake in the line y=re.findall('^www\.[^aeiou]+\.com',inp1)?

Upvotes: 1

Views: 95

Answers (2)

timgeb
timgeb

Reputation: 78740

This can be done with a regex and you don't need a variable-width lookbehind to achieve it. You can use a negative lookahead:

>>> s = 'www.google.com'
>>> re.findall('(?!w{1,3}\.)([^aeiou\W])(?=.*\.com)', s)
['g', 'g', 'l']

Click here for a step-by-step explanation of the regex.

Upvotes: 1

Andrew Cheong
Andrew Cheong

Reputation: 30273

This is not possible with a regex. To find all matches while always checking for the preceding www., you'd need variable-width lookbehinds, which are illegal.

If they worked though, which, again, they do not, the following regex would have been what you were looking for:

y=re.findall('(?<=^www\..*)[^aeiou]+(?=.*?\.com)',inp1)

The answer however is simply that you cannot do what you're looking to do, with a regex.

Upvotes: 1

Related Questions