Reputation: 73
I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.
If there was a file which size < 500MB, there is no problem doing the following:
import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")
but in this case, i got a MemoryError...
Iam looking forward solutions, not necessarily in Python. But I don't have a SPSS license, so I must transform the file with another tool.
Upvotes: 1
Views: 2126
Reputation: 3407
You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:
import pyreadstat
fpath = "path/to/stdFile.sav"
outpath = "stdFile.csv"
# chunksize determines how many rows to be read per chunk
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)
cnt = 0
for df, meta in reader:
# if on the first iteration write otherwise append
if cnt>0:
wmode = "a"
header = False
else:
wmode = "w"
header = True
# write
df.to_csv(outpath, mode=wmode, header=header)
cnt+=1
more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks
Upvotes: 2
Reputation: 2139
First import module savReaderWriter to convert .sav file into structured array then import module numpy to convert structured array into csv:
pip install savReaderWriter
import savReaderWriter
import numpy as np
reader_np = savReaderWriter.SavReaderNp("stdFile.sav")
array = reader_np.to_structured_array("outfile.dat")
np.savetxt("stdFile.csv", array, delimiter=",")
reader_np.close()
Upvotes: 0