Murat Dikici
Murat Dikici

Reputation: 13

How can I extract the result string in BeautifulSoap?

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

start_url = 'https://www.example.com'
downloaded_html = requests.get(start_url)
soup = BeautifulSoup(downloaded_html.text, "lxml")
full_header = soup.select('div.reference-image')
full_header

The Output of the above code is;

[<div class="reference-image"><img src="Content/image/all/reference/c101.jpg"/></div>,
 <div class="reference-image"><img src="Content/image/all/reference/c102.jpg"/></div>,
 <div class="reference-image"><img src="Content/image/all/reference/c102.jpg"/></div>]

I would like to extract the img src content as below;

["Content/image/all/reference/c101.jpg",
 "Content/image/all/reference/c102.jpg",
 "Content/image/all/reference/c102.jpg"]

How can I extract it?

Upvotes: 1

Views: 66

Answers (1)

Joshua Varghese
Joshua Varghese

Reputation: 5202

To get that, just iterate through the result:

img_srcs = []
for i in full_header:
    img_srcs.append(i.find('img')['src'])

This gives:

['Content/image/all/reference/c101.jpg', 'Content/image/all/reference/c102.jpg', 'Content/image/all/reference/c102.jpg']

Here is a one-liner for this:

img_srcs = [i.find('img')['src'] for i in full_header]

Upvotes: 2

Related Questions