Reputation: 112
I am scraping data from one site, and I need to find one img. I get it but the output is not what I need.
I have tried looking online for solutions, changing code but nothing worked.
r = requests.get(baseurl)
content = r.content
soup = BeautifulSoup(content, "html.parser")
images = soup.findAll('img')[1]
print(images)
Output I get:
<img src="https://cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png" style="border-radius: 5px"/>
Output I need:
cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png
(I tried print(images.text))
Upvotes: 3
Views: 16861
Reputation: 1188
you can get the img
tag's src
content using ;
images = soup.findAll('img')[1]
print(images.get("src"))
or
images = soup.findAll('img')[1]
print(images['src'])
Output
https://cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png
The problem with print(images.text)
is that it is used to extract the text in between two tags and you want to extract the text which is inside the tag itself.
Hope this helps you :)
Upvotes: 4
Reputation: 69
Here's a sample you can adapt:
parser.feed('<img src="python-logo.png" alt="The Python logo">')
Start tag: img
attr: ('src', 'python-logo.png')
REFERENCE: https://docs.python.org/3/library/html.parser.html
Upvotes: 1