How to scrape specific elements using beautifulsoup in Python?

Question

I got PHP file which contains repetitive code, which I'm interesting in. Here's example


    Заткнись и танцуй (Shut Up and Dance)

        Дата: 01.01.2017 20:51
Звук: Многоголосый закадровый (LostFilm.TV)

What I'm interesting is torrent title and the link. However, tried to go for span with class. And look for link after. Here is example

url = 'http://www.lostfilm.tv/browse.php?'
lost_f = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
lost_soup = BeautifulSoup(lost_f.text,'html.parser',from_encoding="windows-1251")
for item in lost_soup.findAll('span', {'class': 'torrent_title'}):
print item.text
print item.previous_sibling.previous_sibling['href']

Which brings result:name + incorrect link. How could I get torrent name and related link?

Mohammad Yusuf · Accepted Answer

Something like this?

import re

url = 'http://www.lostfilm.tv/browse.php?'
lost_f = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
lost_soup = BeautifulSoup(lost_f.text,'html.parser', from_encoding="windows-1251")
for a in lost_soup.find_all('a',{'href': re.compile('/browse\.php\?cat=\d+')}):
    print "HREF=", a['href'], "TITLE =", a.text

How to scrape specific elements using beautifulsoup in Python?

Answers (2)

Related Questions