Reputation: 310
I would like to know what is most efficient way to test if a large file exists locally (without loading it in memory). If it doesn't exists (or not readable) then download it. The goal is to upload the data in a pandas DataFrame.
I wrote the snippet below which is working (and tested with a small file). What about correctness and pythonic programming?
url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB
file = "./test_file.csv"
try:
os.open( file, os.O_RDONLY)
df_data = pd.read_csv( file, index_col=0)
except:
df_data = pd.read_csv( url, index_col=0)
df_data.to_csv( file)
Upvotes: 0
Views: 7226
Reputation: 310
os.path.isfile( file) seems to me the best solution: checking before downloading a huge file:
if not os.path.isfile( file):
urllib.urlretrieve(url, file)
df_data = pd.read_csv( file, index_col=0)
It's slower than uploading it directly in memory from url(dowload to disk and then upload into memory), but safer in my situation...
Thx to all
Upvotes: 0
Reputation: 862481
I think you can use try
and catch FileNotFoundError
:
url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB
file = "./test_file.csv"
try:
df_data = pd.read_csv(file, index_col=0)
except FileNotFoundError:
df_data = pd.read_csv(url, index_col=0)
df_data.to_csv(file)
Upvotes: 4
Reputation: 4872
You can check if the file exists, and load from an url if it does not:
import os
import pandas as pd
url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv"
f = "./test.csv"
if os.path.exists(f):
df = pd.read_csv(f)
else:
df = pd.read_csv(url)
Upvotes: 0