alEx
alEx

Reputation: 310

Best way to test if large file exists

I would like to know what is most efficient way to test if a large file exists locally (without loading it in memory). If it doesn't exists (or not readable) then download it. The goal is to upload the data in a pandas DataFrame.

I wrote the snippet below which is working (and tested with a small file). What about correctness and pythonic programming?

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB  
file = "./test_file.csv" 

try:
    os.open( file, os.O_RDONLY)
    df_data = pd.read_csv( file, index_col=0)

except: 
    df_data = pd.read_csv( url, index_col=0)
    df_data.to_csv( file)

Upvotes: 0

Views: 7226

Answers (3)

alEx
alEx

Reputation: 310

os.path.isfile( file) seems to me the best solution: checking before downloading a huge file:

if not os.path.isfile( file):
       urllib.urlretrieve(url, file)
df_data = pd.read_csv( file, index_col=0)

It's slower than uploading it directly in memory from url(dowload to disk and then upload into memory), but safer in my situation...
Thx to all

Upvotes: 0

jezrael
jezrael

Reputation: 862481

I think you can use try and catch FileNotFoundError:

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv" # 4.7kB  
file = "./test_file.csv" 

try:
    df_data = pd.read_csv(file, index_col=0)

except FileNotFoundError: 
    df_data = pd.read_csv(url, index_col=0)
    df_data.to_csv(file)

Upvotes: 4

Robbie
Robbie

Reputation: 4872

You can check if the file exists, and load from an url if it does not:

import os
import pandas as pd

url = "http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv"
f = "./test.csv"

if os.path.exists(f):
    df = pd.read_csv(f)
else:
    df = pd.read_csv(url)

Upvotes: 0

Related Questions