Carlisle
Carlisle

Reputation: 53

Scrape IMG SRC under DIV tag Using BeautifulSoup

I'm trying to get the src for an image that resides under a Div tag from. My code gives me an error, KeyError: 'src'

HTML from EndGadget.com Blog page 10

Here's my code:

for page in range(1,4):
# code that gets dynamic URL
url = sys.argv[1] + "{}".format(page)
print(url)
html=urlopen(url)
soup=BeautifulSoup(html,"lxml")

for article in soup.find_all('article',class_='o-hit'):
    div=soup.find('div',{"class":"o-rating_thumb@m-"})
    img_src = div.find('img').attrs['src']
    #img_src = article.find('div',class_ ='o-rating_thumb c-white').img['src']   
    headline = article.h2.text.strip()

    summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text.strip()

    #img_src = "none"

    print(headline)
    print(summary)
    print(img_src)
    csv_writer.writerow([headline,summary,img_src])

The web page is here: EndGadget Blog page 10

Upvotes: 1

Views: 5942

Answers (2)

Bitto
Bitto

Reputation: 8215

For the top most news item on each page, you can get the image source from the 'src' attribute itself.

You can first navigate to the div in which the image is contained using find() method. Next within that div you can find the img tag and get its source from its attributes.

import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
div=soup.find('div',{"class":"o-rating_thumb@m-"})
print(div.find('img').attrs['src'])

Output:

https://o.aolcdn.com/images/dims?resize=810%2C455&crop=810%2C455%2C0%2C0&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1400%252C933%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1066%26image_uri%3Dhttp%253A%252F%252Fo.aolcdn.com%252Fhss%252Fstorage%252Fmidas%252F85a4e2b124ba329ab520e80e306f07eb%252F206517051%252FIMG_5243e.jpg%26client%3Da1acac3e1b3290917d92%26signature%3Dcea6158d0bf02768d31ee67f2694be6cafaf200c&client=amp-blogside-v2&signature=08a97a1109f1c3287c6766fa284104c6f78770fe

Edit to scrape all news sources of a page:

Even though the first image has an attribute src, to scrape the subsequent images we have to use the attribute data-originals (you can check the page source and find this out). I think this is why you are getting an AttributeError

I was able to scrape all the news items like this

import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
articles=soup.find_all('article',{"class":"o-hit"})
for article in articles:
    print("Heading: ", article.find('h2').text.strip())#heading
    print("Summary: ", article.find('p').text.strip())#summary
    print("Image Source:", article.find('img').attrs['data-original'])#image src
    print()

Output:

Heading:  Netflix will remove user reviews from its website next month
Summary:  Last year five-star ratings got the ax, and now written reviews will fade away too.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F884e68f9a829f3a26db5b729f00ccd03%2F206508290%2FEnglish.jpg&client=amp-blogside-v2&signature=b37eb21e95cef8cebe1f3c741b8bb29eb3471dcc

Heading:  Smart ForTwo Electric Drive quick spin review
Summary:  The saddest way to spend $25,000.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2Fedbdfdfeff2e77567cd0c4a73484d108%2F206502307%2Fsmartfortwo.jpg&client=amp-blogside-v2&signature=a9fc05d80d4b4d8ba6ef33453510c138632bab81

Heading:  Vivo's all-screen NEX S is a frustrating glimpse of the future
Summary:  Spoiler alert: It's really cool, but don't bother importing one.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F29%2F5b36ac0e523dc352bd46785a%2F5b36aedc884c2354eb33d663_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=725c8033196a2ae3500e2144830d14b03e7abc0e

Heading:  Sonos Beam review: Smart features trump minor audio compromises
Summary:  Bringing the soundbar into the smart home era.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F27%2F5b32f579523dc352bd3f66f3%2F5b32fbf2884c2354eb33d62f_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=4ad311aeb5cb23907fd99ec12d962b148646163d

Heading:  BlackBerry KEY2 review: The undisputed keyboard king
Summary:  This is the best Android-powered BlackBerry, if that means anything to you.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F26%2F5b3188ee523dc36212a7ff02%2F5b318be5802b94347b7e586b_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=5438cdf814480be5856d38db73695f86ade186ea

Heading:  Amazon Echo Look review: Good selfie taker, so-so stylist
Summary:  An AI is no match for my style instincts.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F25%2F5b30cbfce880db6107cb7ad0%2F5b30cde61aa5fc22c7bbf187_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=308e9f00afcb968da05823ce0d0718ccc6e43cb4

Heading:  Mitsubishi’s Outlander Plug-In Hybrid is an understated surprise
Summary:  Mitsubishi is back, even though it actually never left.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bc80f523dc36212a2be79%2F5b2bc8a6884c2319c410c008_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=a00b8466fa281051de4d64b1223fe99f97315985

Heading:  Amazon Fire TV Cube review: Alexa still needs work as a TV guide
Summary:  This device was bound to be made at some point, but is it worth it?
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bb81edbaab36faf00ed0e%2F5b2bddfb884c2319c410c00c_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=baa2db64e12d013ab712d823238fc3efeee693f8

Heading:  HTC U12+ review: Fundamentally flawed
Summary:  The phone's pressure-sensitive power and volume keys are kinda the worst.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b28cd94f50775726418990a%2F5b2bd7d4b46ab33c496c1607_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=8518ce5c141fb85b935794fbd3bd283d32508484

Upvotes: 2

chitown88
chitown88

Reputation: 28565

from bs4 import BeautifulSoup
import requests
import time



for page in range(1,11):

    url = 'https://www.engadget.com/reviews/latest/page/%s/' %(page)
    time.sleep(10)

    print ('Page: %s' %(page))    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    articles = soup.find_all('article',class_='o-hit')

    for article in articles:

        img_src = article.find('div',class_ ='o-rating_thumb c-white').img['data-original'] 
        headline = article.h2.text.strip()
        summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text

        print(headline)
        print(summary)
        print(img_src)
        print('\n')

Output: Which you can just write to csv

Page: 1
Surface Studio 2 review: A better all-in-one PC twist
But Microsoft could still go further.
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F20%2F5c1bfc61c0e0af2854a7c103%2F5c1bfcaa3278fb29ca5cf249_o_U_v1.jpg&client=amp-blogside-v2&signature=3c4be6997ee8e877ee7f62ad8d52409232f02ce9


Nikon Z6 review: The best full-frame mirrorless camera for video
10-bit external video, in-body stabilization and a full sensor readout.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1111%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1111%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Ff152c7e0-045b-11e9-bfc7-4d357297511c%26client%3Da1acac3e1b3290917d92%26signature%3Dd3865a04724a29f29b2bd3f6941dcddf9d494bcc&client=amp-blogside-v2&signature=7a72e03b6995fc31e3415c68279a4b038c979ea8


Brava's light-powered smart oven is too expensive to make sense
Preset cook programs can be limiting as well. 
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F19%2F5c1a9bf4fcd67b52d409586b%2F5c1a9f935b5d1b6ddb3a80d6_o_U_v1.png&client=amp-blogside-v2&signature=5f899a54c84b54b651d90824b67587f29677c858


PlayStation Classic review: A disappointing dose of nostalgia
Sony learned nothing from Nintendo.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C928%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C928%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F815fbe10-fd95-11e8-bde6-bdfd52a1c25a%26client%3Da1acac3e1b3290917d92%26signature%3D01ffdb2c7bc74497ae5f2a734feab08629996703&client=amp-blogside-v2&signature=98ae8e659929f4dd7f97d886d00b65303ab18059


Moment's 58mm lens is a portrait machine
The company's new tele lens fixes everything that was wrong with its 2014 model
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F39511ab0-f8d1-11e8-bbae-a119d499ba30%26client%3Da1acac3e1b3290917d92%26signature%3Dc246fb260ca480d6a9a4acb6f91cf10974f32c9a&client=amp-blogside-v2&signature=7dd3cc98cb10045b323dba54e86f2c70c2aa99b4


’Super Smash Bros. Ultimate’ is the perfect nostalgia bomb
It's a must-own for every Nintendo Switch owner.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C900%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C900%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fd8ab34b0-f90d-11e8-befe-815318929941%26client%3Da1acac3e1b3290917d92%26signature%3D4a5feb09e95c4f55ad0e1f8d6322734588ff76f6&client=amp-blogside-v2&signature=3a61128cae6560efafec7efc14211c2187851234


Mercedes’ GLE sports impressive suspension technology
MBUX and the new E-Active Body Control suspension enhance an already splendid SUV.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fc86bc1f0-f7fa-11e8-b77f-844a0908350f%26client%3Da1acac3e1b3290917d92%26signature%3D26e4fe19d43ad8d8f349edc95baf1790e10deecd&client=amp-blogside-v2&signature=f74436bc107af53e468d2d9ece4e88b37cff0a10


Mighty Vibe review: A much improved iPod Shuffle for Spotify
The second-gen model makes some much-needed improvements.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F1ffaaf40-f4e9-11e8-bbdf-b9d9c8fe5ee1%26client%3Da1acac3e1b3290917d92%26signature%3D18cbcc4c82d1c7d9f542dc80c57a4486c318526a&client=amp-blogside-v2&signature=e6d3badcf43ac938173f5353aa3a750c82b72bc3


Google Pixel Slate review: The burden of bad software
Back to the drawing board, Google.
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-11%2F30%2F5c008cf7600c9a1890e1305b%2F5c008d483a4f8c07678d8eb0_o_U_v1.jpg&client=amp-blogside-v2&signature=bbe180eb62cfe43f5241e74c1b7328c70da134c9


Page: 2
Dolby Dimension review: Excellent sound, exorbitant price
At $599, these headphones are too expensive for most, no matter how good they are.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fc6b58350-f250-11e8-8fff-afce4122ee12%26client%3Da1acac3e1b3290917d92%26signature%3D1bbe23e198c73bba2cd252a57f0598bcc32b374b&client=amp-blogside-v2&signature=65f2ee40f3cb346e473d6b188a6939317b4bebad


Nikon Z7 review: Great photos, great video, imperfect autofocus
It’s a strong full-frame mirrorless debut.

https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1019%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1019%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F07cabcc0-ed6f-11e8-af6d-2e14e29f20d0%26client%3Da1acac3e1b3290917d92%26signature%3D0a9704430483d81de5ebc6d4df50feca2228634e&client=amp-blogside-v2&signature=acf075d9ba15703a43abfcb705bd2ccd73e39ac7


All of Amazon's new Echo speakers reviewed
So how good do the new Echo Plus, Dot and Sub really sound?
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fa1d1ba20-ed16-11e8-b9ad-ed849065b748%26client%3Da1acac3e1b3290917d92%26signature%3D55c2bb811c59d95f942f200aec6af5fb35d6e0fc&client=amp-blogside-v2&signature=3a5c7fdde7829033245cec14439a5463c077d702

... ... ...

Upvotes: 0

Related Questions