Is this way to get items from a tag which has 2 class attributes with BeautifulSoup correct?

Question

I'd like to get items from a website with BeautifulSoup.

The target tag is this. The tag has two attrs and white space.

First, I wrote,

roots = soup.find_all("div", "post item")

But, it didn't work. Then I wrote,

html.find_all("div", {'class':['post', 'item']})

I could get items with this,but I am nost sure if this is correct or not. is this code correct?

//// Additional ////

I am sorry,

html.find_all("div", {'class':['post', 'item']})

didn't work properly. It also extracts class="item".

And, I had to write,

soup.find_all("div", class_="post item")

not = but _=. Although this doesn't work for me...(>_<)

Target url:

https://flipboard.com/section/%E3%83%8B%E3%83%A5%E3%83%BC%E3%82%B9-3uscfrirj50pdtqb

mycode:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from urllib.request import urlopen
from bs4 import BeautifulSoup

def main():
    target = "https://flipboard.com/section/%E3%83%8B%E3%83%A5%E3%83%BC%E3%82%B9-3uscfrirj50pdtqb"
    html = urlopen(target)
    soup = BeautifulSoup(html, "html.parser")
    roots = soup.find_all("div", class_="post item")
    print(roots)
        for root in roots:
            print("##################")


if __name__ == '__main__':
    main()

Padraic Cunningham · Accepted Answer

You could use a css select:

soup.select("div.post.item")

Or use class_

.find_all("div", class_="post item")

The docs suggest that *If you want to search for tags that match two or more CSS classes, you should use a CSS selector as per the first example. The give example of both uses:

You can also search for the exact string value of the class attribute:

css_soup.find_all("p", class_="body strikeout")
# []

If you want to search for tags that match two or more CSS classes, you should use a CSS selector:

css_soup.select("p.strikeout.body")
# []

Why your code fails why and any of the above solutions would fail has more to do with the fact the class does not exist in the source, it it were there they would all work:

In [6]: r = requests.get("https://flipboard.com/section/%E3%83%8B%E3%83%A5%E3%83%BC%E3%82%B9-3uscfrirj50pdtqb")

In [7]: cont = r.content

In [8]: "post item" in cont
Out[8]: False

If you look at the browser source and do a search you won't find it either. It is generated dynamically and can only be seen if you crack open a developer console or firebug. They also only contain some styling and a react ids so not sure what you expect to pull from it even if you did get them.

If you want to get the html that you see in the browser, you will need something like selenium

Is this way to get items from a tag which has 2 class attributes with BeautifulSoup correct?

Answers (2)

Related Questions