Reputation: 1579
I have Pandas data frame big data frame loaded into memory. Trying to utilize memory more efficient way.
For this purposes, i won't use this data frame after i will subset from this data frame only rows i am interested are:
DF = pd.read_csv("Test.csv")
DF = DF[DF['A'] == 'Y']
Already tried this solution but not sure if it most effective. Is solution above is most efficient for memory usage? Please advice.
Upvotes: 0
Views: 291
Reputation: 210882
you can try the following trick (if you can read the whole CSV file into memory):
DF = pd.read_csv("Test.csv").query("A == 'Y'")
Alternatively, you can read your data in chunks, using read_csv()
But i would strongly recommend you to save your data in HDF5 Table format (you may also want to compress it) - then you could read your data conditionally, using where parameter in read_hdf() function.
For example:
df = pd.read_hdf('/path/to/my_storage.h5', 'my_data', where="A == 'Y'")
Here you can find some examples and a comparison of usage for different storage options
Upvotes: 1