Reputation: 1
I have some dataframes which are loaded from different npz
files. I combine all the data into a single dataframe and apply some processing to it. Now I want to save the new combined dataframe into a new npz
file. How do I do that?
Since the dataframe is large (5000 rows, 30 columns) I would also like to know the most efficient way of doing so.
I tried to look over the internet for the solution but the results are about how to convert pandas
dataframe to numpy
data.
Upvotes: 0
Views: 2279
Reputation: 121
If the data is huge and consists of numpy arrays as entries, it is recommended to store them in one of following formats. Again, which one to use depends on the requirement but all of them will serve you the need now.
Here is a way to store as pickle file and read it back:
df.to_pickle('df.pkl')
df = pd.read_pickle('df.pkl')
Here is a way to store as hdf file and read it back:
df.to_hdf('df.h5', key='df', mode='w')
df = pd.read_hdf('df.h5', 'df')
Here is a way to store as parquet file and read it back:
df.to_parquet('df.parquet.gzip', compression='gzip')
df = pd.read_parquet('df.parquet.gzip')
Upvotes: 3
Reputation: 5949
If columns of df
have distinct dtypes
you need to pass them as a separate values:
np.savez('out', **{c: df[c].values for c in df.columns})
data = np.load('out.npz')
df = pd.DataFrame({file: data[file] for file in data.files})
For string types you need also allow pickle:
data = np.load('out.npz', allow_pickle=True)
For memory-efficient use you might also need to replace np.savez
with np.savez_compressed
.
Upvotes: 0
Reputation: 726
It seems that the best solution for your problem is to convert your dataframe to a numpy array and afterwards save it.
np.savez(file, df.to_numpy())
file
has to be a file, in which you want to save your data and df
is the dataframe in which you have your data.
Upvotes: 1