BeautifulSoup - which URL

Question

I started a little project. I am trying to scrape the URL http://pr0gramm.com/ and save the tags under a picture in a variable, but I have problems to do so.

I am searching for this in the code

Flaschenkind

And I actually just need the part "Flaschenkind" to be saved, but also the following tags in that line.

This is my code so far

import requests
from bs4 import BeautifulSoup

url = "http://pr0gramm.com/"
r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

links = soup.find_all("div", {"class" : "item-tags"})

print(links)

I sadly just get this output

[]

I already tried to change the URL to http://pr0gramm.com/top/ but I get the same output. I wonder if it happens because the site might be made with JavaScript and it can't scrape the data correctly then?

JimmyNJ · Accepted Answer

First off your URL is a Java Script enabled version of this site. They offer a static URL as www.pr0gramm.com/static/ Here you'll find the content formatted more like your example suggests you expect.

Using this static version of the URL I retrieved tags using the code below like yours. I removed the class tag filter. Python 2.7

import bs4
import urllib2

def main():

    url = "http://pr0gramm.com/static/"
    try:
        fin = urllib2.urlopen(url)
    except:
        print "Url retrieval failed url:",url
        return None

    html = fin.read()

    bs = bs4.BeautifulSoup(html,"html5lib")

    links = bs.find_all("a")
    print links
    return None


if __name__ == "__main__":
    main()

BeautifulSoup - which URL

Answers (2)

Related Questions