Reputation: 135
I am building a script to download a .mp4 file from a specified gyfycat webpage using requests and BeautifulSoup. I have run into an error where I cannot access the 'src'
attribute of a source tag. I am targeting the following HTML element:
<source src="https://giant.gfycat.com/PoshDearAsianporcupine.mp4" type="video/mp4">
my code works when i replace the the tag and attribute with 'a'
and 'href'
, respectively so I am not sure why I am unable to access this
'src'
attribute. Code is below:
import requests
from bs4 import BeautifulSoup
gyfyUrl = 'https://gfycat.com/PoshDearAsianporcupine'
# creating a response object
r = requests.get(gyfyUrl)
# creating beautiful soup object
soup = BeautifulSoup(r.content,'html5lib')
# finding source tags in page
sourceTags = soup.findAll('source')
#printing found tags for clarity
print(sourceTags)
# printing src attribute within source tags - Error
for tag in sourceTags:
print(tag['src'])
Upvotes: 0
Views: 490
Reputation: 11157
There issue here is that not every source
tag has a src
attribute, in this case the very first one does not. You can use a conditional list comprehension like the following to collect all src
attributes if they exist:
srcs = [tag["src"] for tag in sourceTags if "src" in tag.attrs]
Result:
['https://giant.gfycat.com/PoshDearAsianporcupine.webm', 'https://giant.gfycat.com/PoshDearAsianporcupine.mp4', 'https://thumbs.gfycat.com/PoshDearAsianporcupine-mobile.mp4']
Upvotes: 1