Reputation: 19
I am trying to download CUB_200_2011 dataset in colab using
!wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
after running this i got
--2021-05-28 10:13:12-- http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
Resolving www.vision.caltech.edu (www.vision.caltech.edu)... 34.208.54.77
Connecting to www.vision.caltech.edu (www.vision.caltech.edu)|34.208.54.77|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view [following]
--2021-05-28 10:13:12-- https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view
Resolving drive.google.com (drive.google.com)... 74.125.195.102, 74.125.195.113, 74.125.195.138, ...
Connecting to drive.google.com (drive.google.com)|74.125.195.102|:443... connected.
HTTP request sent, awaiting response... 200 OK
**Length: unspecified [text/html]**
Saving to: ‘CUB_200_2011.tgz’
CUB_200_2011.tgz [ <=> ] 71.36K --.-KB/s in 0.03s
2021-05-28 10:13:13 (2.41 MB/s) - ‘CUB_200_2011.tgz’ saved [73069]
Length is unspecified and it says its an HTML file and cannot unrar it as i get an error.
!tar -xvzf CUB_200_2011.tgz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Is there anything wrong with the link or what is the problem?
Upvotes: 0
Views: 994
Reputation: 389
It seems that the original authors redirected the dataset link to a google drive link (This broke tons of online tutorials) but a new public source of the data is provided by fast.ai and can be obtained in ipython session with the following line:
!wget https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz
Upvotes: 1
Reputation: 893
See the message carefully, download URL leading to google drive folder in which it navigates in the confirmation page instead of initiating the download. The following command is prepared for your requirement, where you see configuring the download with Google Drive file id, setting CUB_200_2011.tgz as an output file, using cookies.txt file as specified by --keep-session-cookie to hold cookie information during the download, enabled auto-confirmation for the download, also skipping the certificate check by --no-check-certificate, and removeed cookies.txt at the end after the download is over.
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45" -O CUB_200_2011.tgz && rm -rf /tmp/cookies.txt
Also, nothing wrong with your tar command, it should work properly when you complete the first command correctly. Hopefully, it resolves your issue.
Upvotes: 1