Claudiu Creanga
Claudiu Creanga

Reputation: 8386

How to access a hdf5 file (or any file) efficiently

I have a hdf5 file that contains about 10 databases that I need across my project in various places (different modules).

At the moment I use a simple function that will give me the database that I want:

def get_hdf5_dataframe(dataframe_name: str) -> pd.DataFrame:
    db = pd.HDFStore("/database.h5")
    df = db[dataframe_name]
    db.close() # needs to be closed every time I access it

    return df

However, this is not efficient as the program will have to load the file every time.

If I use lru_cache decorator then the program will load the file 10 times for each database.

What will be an efficient way to get the databases by loading the file only once and make sure I close the hdf5 file after reading it.

Upvotes: 0

Views: 185

Answers (1)

avigil
avigil

Reputation: 2246

You could store the opened file as a global:

db = None

def get_hdf5_dataframe(dataframe_name: str) -> pd.DataFrame:
    global db
    if db is None:
        db = pd.HDFStore("/database.h5")
    df = db[dataframe_name]

    return df

This will only open it once on first access (although the file will stay open for the life of your program). Use globals with caution though- they can make life difficult if overused.

Upvotes: 1

Related Questions