Viral Parmar
Viral Parmar

Reputation: 488

File Partially Download with urllib.request.urlretrieve

I have this code Which is trying to retrieve file from Git Hub Repositories.

import os
import tarfile
from six.moves import urllib
import urllib.request

DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml/tree/master/"
HOUSING_PATH = os.path.join("datasets", "housing").replace("\\","/")
print(HOUSING_PATH)
HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH
print(HOUSING_URL)
print(os.getcwd())

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz").replace("\\","/")
    print(tgz_path)
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

fetch_housing_data()

After Executing the code I got this Error ReadError: file could not be opened successfully. I did checked the actual file size and the file which is download after executing this code and I came to know that file is downloaded partially. So is their any way to download the whole file ? Thanks in Advance

Upvotes: 3

Views: 947

Answers (1)

Viral Parmar
Viral Parmar

Reputation: 488

Finally I got the problem. It was with the link that I was using to retrieve the file. I didn't knew that RAW link should be used along with the file name (Not using file name will give you 404 Error) in Git Hub Repositories.

So I little bit of modification is needs to be done in actual code posted in my question. That is : Change the link from

DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml/tree/master/"

To this :

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"

And this

HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH

to

HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz" \\**( Actual File name is needed)**

Thank you !

Upvotes: 3

Related Questions