Naveen G
Naveen G

Reputation: 25

Python : extracting image source from HTML tag using Beautiful Soup

I'm trying to print the image Src tag value alone, I could successfully able to print the image tag value, But not able to get the src tag value.

import urllib3
import certifi
from urllib3 import PoolManager
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
manager=PoolManager(num_pools=3,cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())

base_url="https://app.tipotapp.com/docs/quickstart/"
page=manager.request('GET',base_url)
soup = BeautifulSoup(page.data, 'html.parser')
idd='creating-an-application'

for sibling in soup.find(id=idd).next_siblings:
    if sibling.name is None :
       continue
    elif sibling.name != 'h2'  :
       print(sibling.getText())
       if sibling.img is not None:
          print(sibling.img)
          #print(sibling.select_one("img"))
       else:
          continue  
    else :   
        break

The output which I'm getting now is,

Prints: ....Some expected strings.... Then the below output

<img alt="Student Management System" 
src="https://app.tipotapp.com/docs/images/quickstart/image_004.png"/>

In that, I want to print only src value.

Upvotes: 0

Views: 450

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7238

To get the value of an attribute, use the __getitem__(self, key) method.

tag[key] returns the value of the 'key' attribute for the tag, throws an exception if it's not there.

Just replace the line print(sibling.img) with

print(sibling.img['src'])

Output:

https://app.tipotapp.com/docs/images/quickstart/image_002.png
https://app.tipotapp.com/docs/images/quickstart/image_002_1.png
https://app.tipotapp.com/docs/images/quickstart/image_004.png

Upvotes: 1

Related Questions