Reputation: 1487
I have to catch all the links of the topics in this page: https://www.inforge.net/xi/forums/liste-proxy.1118/
I've tried with this script:
import urllib.request
from bs4 import BeautifulSoup
url = (urllib.request.urlopen("https://www.inforge.net/xi/forums/liste-proxy.1118/"))
soup = BeautifulSoup(url, "lxml")
for link in soup.find_all('a'):
print(link.get('href'))
but it prints all the links of the page, and not just the links of the topics as I'd like to. could you suggest me the fast way to do it? I'm still a newbie, and i've started learning python recently.
Upvotes: 0
Views: 57
Reputation: 43246
You can use BeautifulSoup to parse the HTML:
from bs4 import BeautifulSoup
from urllib2 import urlopen
url= 'https://www.inforge.net/xi/forums/liste-proxy.1118/'
soup= BeautifulSoup(urlopen(url))
Then find the links with
soup.find_all('a', {'class':'PreviewTooltip'})
Upvotes: 2