How to convert large .sav file into csv file

Question

I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.

If there was a file which size < 500MB, there is no problem doing the following:

import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")

but in this case, i got a MemoryError...

Iam looking forward solutions, not necessarily in Python. But I don't have a SPSS license, so I must transform the file with another tool.

Otto Fajardo · Accepted Answer

You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:

import pyreadstat
fpath = "path/to/stdFile.sav"
outpath = "stdFile.csv"
# chunksize determines how many rows to be read per chunk
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)

cnt = 0
for df, meta in reader:
    # if on the first iteration write otherwise append
    if cnt>0:
        wmode = "a"
        header = False
    else:
        wmode = "w"
        header = True
    # write
    df.to_csv(outpath, mode=wmode, header=header)
    cnt+=1

more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks

How to convert large .sav file into csv file

Answers (2)

Related Questions