Reputation: 799
I have a dataset which needs to be loaded from database. I'm wondering what is the difference between the following two ways of handling it.
import pandas as pd
from dataclasses import dataclass, field
@dataclass
class A:
df: pd.DataFrame = field(init=False)
def load_df(self):
self.df = query_from_database()
and
import pandas as pd
from dataclasses import dataclass, field
from functools import cached_property
@dataclass
class A:
@cached_property
def df(self):
df = query_from_database()
return df
Upvotes: 0
Views: 42
Reputation: 11237
in method 2:
you once intialised, it is hard to de-initalise without distorying instance, more better suited if want to access frequent if not much change, data is stored in cache, larger data will be saved in cache which make it memory intensive
in method 1: more control on access / reload, data stored in memory, can reload data without distorying instance
Upvotes: 0
Reputation: 1
import pandas as pd
import numpy as np
from dataclasses import dataclass, field
from functools import cached_property
def query_from_database():
print("query_from_database")
return pd.DataFrame(np.zeros((3, 4)))
class A:
df: pd.DataFrame = field(init=False)
def load_df(self):
self.df = query_from_database()
class B:
@cached_property
def df(self):
df = query_from_database()
return df
if __name__ == '__main__':
a = A()
a.load_df()
a.load_df()
# print 2 times of 'query_from_database'
# With @cached_property, the function name becomes the property of class B, same as self.df.
# When u use b.df, the function body of df(self) will be executed.
# After the first time of df(self) involved, the data of df will be cached.
# If use b.df (i.e. involve df(self)) again, the function body will not be executed.
# And the cached property, which same as self.df, will be returned directly.
b = B()
b.df
b.df
# print only 1 time of 'query_from_database'
Upvotes: 0