Efficient storage of large matrix on HDD

Question

I have many large 1GB+ matrices of doubles (floats), many of them 0.0, that need to be stored efficiently. I indend on keeping the double type since some of the elements do require to be a double (but I can consider changing this if it could lead to a significant space saving). A string header is optional. The matrices have no missing elements, NaNs, NAs, nulls, etc: they are all doubles.

Some columns will be sparse, others will not be. The proportion of columns that are sparse will vary from file to file.

What is a space efficient alternative to CSV? For my use, I need to parse this matrix quickly into R, python and Java, so a file format specific to a single language is not appropriate. Access may need to be by row or column.

I am also not looking for a commercial solution.

My main objective is to save HDD space without blowing out io times. RAM usage once imported is not the primary consideration.

Efficient storage of large matrix on HDD

Answers (1)

Related Questions