Reputation: 488
I have this code Which is trying to retrieve file from Git Hub Repositories.
import os
import tarfile
from six.moves import urllib
import urllib.request
DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml/tree/master/"
HOUSING_PATH = os.path.join("datasets", "housing").replace("\\","/")
print(HOUSING_PATH)
HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH
print(HOUSING_URL)
print(os.getcwd())
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz").replace("\\","/")
print(tgz_path)
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
fetch_housing_data()
After Executing the code I got this Error ReadError: file could not be opened successfully. I did checked the actual file size and the file which is download after executing this code and I came to know that file is downloaded partially. So is their any way to download the whole file ? Thanks in Advance
Upvotes: 3
Views: 947
Reputation: 488
Finally I got the problem. It was with the link that I was using to retrieve the file. I didn't knew that RAW link should be used along with the file name (Not using file name will give you 404 Error) in Git Hub Repositories.
So I little bit of modification is needs to be done in actual code posted in my question. That is : Change the link from
DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml/tree/master/"
To this :
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
And this
HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH
to
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz" \\**( Actual File name is needed)**
Thank you !
Upvotes: 3