Reputation: 913
I have this simple program that takes in a file from stdin and output only the host (example: returning only HOST.
Except when I run cat sample.html | python program.py right now it outputs href"=google.com
I want it to remove the 'href=" part and have it only output google.com, but when I tried removing it from my re, it became even worse. Thoughts?
import re
import sys
s = sys.stdin.read()
lines=s.split('\n')
match = re.search(r'href=[\'"]?([^\'" >]+)', s) #here
if match:
print match.group(0)
Thank you.
Upvotes: 0
Views: 121
Reputation: 70750
That is because you reference group(0) when it should be group(1) which holds the actual match result.
if match:
print match.group(1)
Upvotes: 2