Reputation: 19

CUB_200_2011 Dataset Download link Error COLAB

I am trying to download CUB_200_2011 dataset in colab using

!wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz

after running this i got

--2021-05-28 10:13:12--  http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
Resolving www.vision.caltech.edu (www.vision.caltech.edu)... 34.208.54.77
Connecting to www.vision.caltech.edu (www.vision.caltech.edu)|34.208.54.77|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view [following]
--2021-05-28 10:13:12--  https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view
 Resolving drive.google.com (drive.google.com)... 74.125.195.102, 74.125.195.113, 74.125.195.138, ...
 Connecting to drive.google.com (drive.google.com)|74.125.195.102|:443... connected.
 HTTP request sent, awaiting response... 200 OK
 **Length: unspecified [text/html]**
 Saving to: ‘CUB_200_2011.tgz’

 CUB_200_2011.tgz        [ <=>                ]  71.36K  --.-KB/s    in 0.03s  


 2021-05-28 10:13:13 (2.41 MB/s) - ‘CUB_200_2011.tgz’ saved [73069]

Length is unspecified and it says its an HTML file and cannot unrar it as i get an error.

!tar -xvzf CUB_200_2011.tgz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Is there anything wrong with the link or what is the problem?

Upvotes: 0

Answers (2)

Ahmed Mohamedeen

Reputation: 389

It seems that the original authors redirected the dataset link to a google drive link (This broke tons of online tutorials) but a new public source of the data is provided by fast.ai and can be obtained in ipython session with the following line:

!wget https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz

Upvotes: 1

Suresh Gautam

Reputation: 893

See the message carefully, download URL leading to google drive folder in which it navigates in the confirmation page instead of initiating the download. The following command is prepared for your requirement, where you see configuring the download with Google Drive file id, setting CUB_200_2011.tgz as an output file, using cookies.txt file as specified by --keep-session-cookie to hold cookie information during the download, enabled auto-confirmation for the download, also skipping the certificate check by --no-check-certificate, and removeed cookies.txt at the end after the download is over.

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45" -O CUB_200_2011.tgz && rm -rf /tmp/cookies.txt

Also, nothing wrong with your tar command, it should work properly when you complete the first command correctly. Hopefully, it resolves your issue.

Upvotes: 1

CUB_200_2011 Dataset Download link Error COLAB

Answers (2)

Related Questions