Manuel Quintana
Manuel Quintana

Reputation: 73

How to convert large .sav file into csv file

I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.

If there was a file which size < 500MB, there is no problem doing the following:

import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")

but in this case, i got a MemoryError...

Iam looking forward solutions, not necessarily in Python. But I don't have a SPSS license, so I must transform the file with another tool.

Upvotes: 1

Views: 2126

Answers (2)

Otto Fajardo
Otto Fajardo

Reputation: 3407

You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:

import pyreadstat
fpath = "path/to/stdFile.sav"
outpath = "stdFile.csv"
# chunksize determines how many rows to be read per chunk
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)

cnt = 0
for df, meta in reader:
    # if on the first iteration write otherwise append
    if cnt>0:
        wmode = "a"
        header = False
    else:
        wmode = "w"
        header = True
    # write
    df.to_csv(outpath, mode=wmode, header=header)
    cnt+=1


more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks

Upvotes: 2

Mahsa Hassankashi
Mahsa Hassankashi

Reputation: 2139

First import module savReaderWriter to convert .sav file into structured array then import module numpy to convert structured array into csv:

pip install savReaderWriter

savReaderWriter

import savReaderWriter 
import numpy as np

reader_np = savReaderWriter.SavReaderNp("stdFile.sav")
array = reader_np.to_structured_array("outfile.dat") 
np.savetxt("stdFile.csv", array, delimiter=",")
reader_np.close()

Upvotes: 0

Related Questions