user3295674
user3295674

Reputation: 913

re: matching 'a href' tag

I have this simple program that takes in a file from stdin and output only the host (example: returning only HOST.

Except when I run cat sample.html | python program.py right now it outputs href"=google.com

I want it to remove the 'href=" part and have it only output google.com, but when I tried removing it from my re, it became even worse. Thoughts?

import re
import sys

s = sys.stdin.read()
lines=s.split('\n')

match = re.search(r'href=[\'"]?([^\'" >]+)', s) #here
if match:
    print match.group(0)

Thank you.

Upvotes: 0

Views: 121

Answers (1)

hwnd
hwnd

Reputation: 70750

That is because you reference group(0) when it should be group(1) which holds the actual match result.

if match:
   print match.group(1)

Upvotes: 2

Related Questions