How to check if tag or/and is the children of div on Beautiful Soup

Question

So, let's say I have page like this inside of the tag

And I want to scrap using the code below :

import requests
from bs4 import BeautifulSoup as soup

my_url = 'http://www.foo-url.com'

uClient = requests.get(my_url)
page_html = uClient.text
uClient.close()

page_soup = soup(page_html, "html.parser")

#Identify Each Post Group
containers = page_soup.findAll("div",{"class": "album-item"})

data = []

for container in containers:
    #Store Each Pictures To An Object
    items = container.findAll("a")

    for item in items:
        #Set The Link Location
        link_location = item.attrs['href']
        image_item = item.find("img")

        #Set The Image Location
        img_location = image_item.attrs['src']

        data.append((link_location, img_location))

    #Just Incase Only Image
    imgs = container.findAll("img")

    for img in imgs:
        link_location = "NoLink"
        img_location = img.attrs['src']
        data.append((link_location, img_location))

for link_location, img_location in data:
    print(link_location + " | " + img_location)

And On the result, There is a lot of duplicates like this :

http://www.foo.com/img/1 | http://thumbnail.foo.com/img/1.jpg
http://www.foo.com/img/2 | http://thumbnail.foo.com/img/2.jpg
http://www.foo.com/img/3 | http://thumbnail.foo.com/img/3.jpg
http://www.foo.com/img/4 | http://thumbnail.foo.com/img/4.jpg

NoLink | http://thumbnail.foo.com/img/1.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/2.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/3.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/4.jpg       #duplicate

NoLink | http://large.foo.com/img/5.jpg
NoLink | http://large.foo.com/img/6.jpg

http://www.foo.com/img/7 | http://thumbnail.foo.com/img/7.jpg
http://www.foo.com/img/8 | http://thumbnail.foo.com/img/8.jpg
http://www.foo.com/img/9 | http://thumbnail.foo.com/img/9.jpg
http://www.foo.com/img/10 | http://thumbnail.foo.com/img/10.jpg

NoLink | http://thumbnail.foo.com/img/7.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/8.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/9.jpg       #duplicate
NoLink | http://thumbnail.foo.com/img/10.jpg      #duplicate

NoLink | http://large.foo.com/img/11.jpg
NoLink | http://large.foo.com/img/12.jpg

My idea is, to check inside of the

if all of the children tag , then do the for item in items:
else if all of the children tag , then do the for img in imgs:
but then what if there are both of tag ?

And I am not sure how check that tag either
On the first

I tried to use if(container.select("img")) which should be false,
but the value is true because it detect the tag that is inside of tag

So, how should I approach this ?

Keyur Potdar · Accepted Answer

The thing you want, is tag.find_all(recursive=False).

From the documentation:

If you call mytag.find_all(), Beautiful Soup will examine all the descendants of mytag: its children, its children’s children, and so on. If you only want Beautiful Soup to consider direct children, you can pass in recursive=False.

In your code, change this line

imgs = container.findAll("img")

to

imgs = container.findAll("img", recursive=False)

Output:

http://www.foo.com/img/1 | http://thumbnail.foo.com/img/1.jpg
http://www.foo.com/img/2 | http://thumbnail.foo.com/img/2.jpg
http://www.foo.com/img/3 | http://thumbnail.foo.com/img/3.jpg
http://www.foo.com/img/4 | http://thumbnail.foo.com/img/4.jpg
NoLink | http://large.foo.com/img/5.jpg
NoLink | http://large.foo.com/img/6.jpg
http://www.foo.com/img/7 | http://thumbnail.foo.com/img/7.jpg
http://www.foo.com/img/8 | http://thumbnail.foo.com/img/8.jpg
http://www.foo.com/img/9 | http://thumbnail.foo.com/img/9.jpg
http://www.foo.com/img/10 | http://thumbnail.foo.com/img/10.jpg
NoLink | http://large.foo.com/img/11.jpg
NoLink | http://large.foo.com/img/12.jpg

How to check if tag <a> or/and <img> is the children of div on Beautiful Soup

Answers (1)

Related Questions

How to check if tag &lt;a&gt; or/and &lt;img&gt; is the children of div on Beautiful Soup

Answers (1)

Related Questions

How to check if tag <a> or/and <img> is the children of div on Beautiful Soup