Reputation: 563
I'm new to Deep Learning and PyTorch, so please do bear with me if some questions seem silly or I'm not asking in the correct format. I was watching this video as part of a PyTorch series on Deep Learning: https://www.youtube.com/watch?v=8n-TGaBZnk4 . This video specifically is about ETL (using Fashion-MNIST dataset). I have a few questions on the video at 7:05.
Question 1: In the Fashion-MNIST subclass constructor we passed it the argument:
‘root’
, where the instructor mentioned: this is the location in disk where data is located. Sorry maybe this is a silly question, but is this where the data is located on the source server (from the URL) disk, or is this the path location where you want to save the data on your computer locally?
Question 2: Also for the Fashion-MNIST is the 'root'
always the same location path: i.e. './data/FashionMNIST'
?
Question 3: If the 'root'
defines the location path where the data is located on the source server, then where would it be downloaded on locally? I checked my 'download'
folder (I'm using Windows 7 laptop), and couldn't find the files there?
Question 4: The video mentioned that we should check if the data, in subsequent calls, are downloaded already or not (i.e. in the argument we pass download=true
).
4(a): What's a good approach to do this? Do we put an if
statement in place to check for this? Or is there a smarter way of checking for downloaded data?
4(b): Also what does it mean by "subsequent calls"
? Does it mean when we need to call the 'FashionMNIST'
constructor again for the test_data download?
Question 5: Finally, I tried running the code below (which is the one in the video) on Spyder IDE (Python 3.5):
import torch
import torchvision
import torchvision.transforms as transforms
train_set = torchvision.datasets.FashionMNIST(
root='./data/FashionMNIST'
,train=True
,download=True
,transform=transforms.Compose([
transforms.ToTensor()
])
)
I got the output:
Traceback (most recent call last):
File "<ipython-input-3-3ac000b9e90a>", line 10, in <module>
transforms.ToTensor()
File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\mnist.py", line 68, in __init__
self.download()
File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\mnist.py", line 136, in download
makedir_exist_ok(self.raw_folder)
File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\utils.py", line 41, in makedir_exist_ok
os.makedirs(dirpath)
File "C:\Program Files\Anaconda3\lib\os.py", line 241, in makedirs
mkdir(name, mode)
FileNotFoundError: [WinError 206] The filename or extension is too long: './data/FashionMNIST\\FashionMNIST\\raw'
Not sure why I got that error at the end. In addition I ran the code on Jupyter Notebook, as per the video, and it worked fine. But I'm wondering why it throws that error in Spyder IDE.
Many thanks in advance.
Upvotes: 2
Views: 1087
Reputation: 3816
No genuine question is a silly question, Answering questions one bye one:
Ans 1 & 2 :
root
is the path on your local disk where the data will be saved, you can give ny path according to your liking it will not cause an issue.
Ans 3: The urls etc are defined within the files and the path of the data is all you need to do: in order to look at the urls from where the data is downloaded here is a link.
Ans 4. : download = True
merely gives it permission to download if the data doesn't exists the downloader will automatically check if the data already exists, if it exists it will still not download, even if download is set to be true, again it happens in the background you don't have to worry about it.
Ans5 : The issue isn't a torch issue exactly it has more to do with how it is being compiled on in windows, the issue is discussed at length here & here
Upvotes: 2