Reputation: 1
from BeautifulSoup import BeautifulSoup
import urllib2
import re
user = raw_input('begin here!: ')
base = ("http://1337x.org/search/")
print (base + user)
add_on = "/0/"
total_link = (base + user + add_on)
html_data = urllib2.urlopen(total_link, 'r').read()
soup = BeautifulSoup(html_data)
announce = soup.find('a', attrs={'href': re.compile("^/announcelist")})
print announce
i am attempting to retrieve a torrent link preferably the first non sponsored link. from a page and then have it print the link. i am rather new at this coding so as much detail as you can give would be perfect! thank you so much for the help!
Upvotes: 0
Views: 135
Reputation: 27090
The problem is in your regular expression. You are trying to use the ^
character to negate the regex, but it does not work in your situation. The ^
only negates a set of characters (a set of chars inside []
); even in this case it only negates if it is the first char. For example, [^aeiou]
means "any character except a
, e
, i
, o
and u
".
When you use ^
outside a character set, then it matches the beginning of a line. For example, ^aeiou
matches a line which starts with the aeiou
string.
So, how would you negate a regex? Well, the best way I see is to use a negative lookahead, which is a regex that starts with (?!
and ends with )
. For your case, it is pretty easy:
(?!/announcelist)
So, replace the re.compile("^/announcelist")
by re.compile("(?!/announcelist)")
and it should work - at least worked here :)
Upvotes: 1