Tom Crews
Tom Crews

Reputation: 135

Cannot access ['src'] attribute of <source> tag with BeautifulSoup

I am building a script to download a .mp4 file from a specified gyfycat webpage using requests and BeautifulSoup. I have run into an error where I cannot access the 'src' attribute of a source tag. I am targeting the following HTML element:

<source src="https://giant.gfycat.com/PoshDearAsianporcupine.mp4" type="video/mp4">

my code works when i replace the the tag and attribute with 'a' and 'href', respectively so I am not sure why I am unable to access this 'src' attribute. Code is below:

import requests
from bs4 import BeautifulSoup

gyfyUrl = 'https://gfycat.com/PoshDearAsianporcupine'

# creating a response object
r = requests.get(gyfyUrl)

# creating beautiful soup object
soup = BeautifulSoup(r.content,'html5lib')

# finding source tags in page
sourceTags = soup.findAll('source')

#printing found tags for clarity
print(sourceTags)

# printing src attribute within source tags - Error
for tag in sourceTags:
   print(tag['src'])

Upvotes: 0

Views: 490

Answers (1)

cody
cody

Reputation: 11157

There issue here is that not every source tag has a src attribute, in this case the very first one does not. You can use a conditional list comprehension like the following to collect all src attributes if they exist:

srcs = [tag["src"] for tag in sourceTags if "src" in tag.attrs]

Result:

['https://giant.gfycat.com/PoshDearAsianporcupine.webm', 'https://giant.gfycat.com/PoshDearAsianporcupine.mp4', 'https://thumbs.gfycat.com/PoshDearAsianporcupine-mobile.mp4']

Upvotes: 1

Related Questions