Reputation: 543
I've probably spent too long on this already but I'm finding it hard to understand why I'm getting a FileNotFoundError: [Errno 2] No such file or directory: when the only difference I can see is the link. Using Beautiful Soup
Objective: Download an image and place in a different folder which works fine except on some .jpg files. I've tried different types of paths and striping the file names but its the same problem.
Test images:
http://img2.rtve.es/v/5437650?w=1600&preview=1573157283042.jpg # Not Working
http://img2.rtve.es/v/5437764?w=1600&preview=1573172584190.jpg #Works perfect
Here is the function:
def get_thumbnail():
'''
Download image and place in the images folder
'''
soup = BeautifulSoup(r.text, 'html.parser')
# Get thumbnail image
for preview in soup.findAll(itemprop="image"):
preview_thumb = preview['src'].split('//')[1]
# Download image
url = 'http://' + str(preview_thumb).strip()
path_root = Path(__file__).resolve().parents[1]
img_dir = str(path_root) + '\\static\\images\\'
urllib.request.urlretrieve(url, img_dir + show_id() + '_' + get_title().strip()+ '.jpg')
Other functions used:
def show_id():
for image_id in soup.findAll(itemprop="image"):
preview_id = image_id['src'].split('/v/')[1]
preview_id = preview_id.split('?')[0]
return preview_id
def get_title():
title = soup.find('title').get_text()
return title
All I can work out is the problem must be finding the images folder for the first image but the second works perfect.
This is the error I keep getting and it seems to be breaking at request.py
Thanks for any input.
Upvotes: 1
Views: 38
Reputation: 473903
It's quite likely the "special characters" in the image filename are throwing urlretrieve()
(and open()
used inside it) off:
>>> from urllib import urlretrieve # Python 3: from urllib.request import urlretrieve
>>> url = "https://i.sstatic.net/1RUYX.png"
>>> urlretrieve(url, "test.png") # works
('test.png', <httplib.HTTPMessage instance at 0x10b284a28>)
>>> urlretrieve(url, "/tmp/test 07/11/2019.png") # does not work
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 249, in retrieve
tfp = open(filename, 'wb')
IOError: [Errno 2] No such file or directory: '/tmp/test 07/11/2019.png'
In other words, the image titles you use as filenames must be properly pre-formatted before using as filenames for saving. I'd just "slugify" them to avoid having problems with it at all. One way to do it would be to simply use slugify
module:
import os
from slugify import slugify
image_filename = slugify(show_id() + '_' + get_title().strip()) + '.jpg'
image_path = os.path.join(img_dir, image_filename)
urllib.request.urlretrieve(url, image_path)
For instance, that is what would slugify do to test 07/11/2019
image name:
>>> slugify("test 07/11/2019")
'test-07-11-2019'
See also:
Upvotes: 1