Eino Gourdin
Eino Gourdin

Reputation: 4507

Pandas - save multiple CSV in a zip archive

I need to save multiple dataframes in CSV, all in a same zip file. Is it possible without making temporary files?

I tried using zipfile:

with zipfile.ZipFile("archive.zip", "w") as zf:
    with zf.open(f"file1.csv", "w") as buffer:
        data_frame.to_csv(buffer, mode="wb")

This works with to_excel but fails with to_csv as as zipfiles expects binary data and to_csv writes a string, despite the mode="wb" parameter:

.../lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 283, in _save_header

writer.writerow(encoded_labels)

.../lib/python3.8/zipfile.py", line 1137, in write

TypeError: a bytes-like object is required, not 'str'

On the other hand, I tried using the compression parameter of to_csv, but the archive is overwritten, and only the last dataframe remains in the final archive.

If no other way, I'll use temporary files, but I was wondering if someone have an idea to allow to_csv and zipfile work together.

Thanks in advance!

Upvotes: 5

Views: 2612

Answers (2)

Rishabh Gupta
Rishabh Gupta

Reputation: 87

For saving multiple excel files from dataframe in a zip file

import zipfile
from zipfile import ZipFile
import pandas as pd

df1 = pd.DataFrame({"x":[1,2,3]})
df2 = pd.DataFrame({"y":[4,5,6]})
df3 = pd.DataFrame({"z":[7,8,9]})

with zipfile.ZipFile("rishabh.zip", "w") as zf:
      with zf.open(f"check1.xlsx", "w") as buffer:
         df1.to_excel(buffer,index=False)
      with zf.open(f"check2.xlsx", "w") as buffer:
         df2.to_excel(buffer,index=False)
      with zf.open(f"check3.xlsx", "w") as buffer:
         df3.to_excel(buffer, index=False)

Upvotes: 3

Daweo
Daweo

Reputation: 36370

I would approach this following way

import io
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
string_io = io.StringIO()
df.to_csv(string_io)
string_io.seek(0)
df_bytes = string_io.read().encode('utf-8')

as df_bytes is bytes it should now work with zipfile. Edit: after looking into to_csv help I found simpler way, to get bytes namely:

import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
df_bytes = df.to_csv().encode('utf-8')

Upvotes: 4

Related Questions