Memory Error while creating Data frame from list

Question

I'm running a SQL query using cx_oracle in python, and the result of the query is a list. The dimension of list if 180 columns * 200,000+ rows. Whenever I try to convert it into a data frame using pd.DataFrame I run into a Memory Error. For now as a solution I've tried to breakdown my query by putting filters, or query only a few columns etc, which works out. But then if I change some filters I again run into the error, and I can't always be sure of how many rows a query will return.

So I'm looking for any alternative data structures/library/packages which can be used. Or anyway where I can handle this within Pandas? As I'm doing data analysis using Pandas, so I would prefer if there is a way to handle this in Pandas rather than another library.

The fields in the list is either float,string or timestamp format.

MaxU - stand with Ukraine · Accepted Answer

Try to read data directly into Pandas DataFrame:

import cx_Oracle   #  pip install cx_Oracle
from sqlalchemy import create_engine

engine = create_engine('oracle://user:password@host_or_scan_address:1521/ORACLE_SERVIVE_NAME')

df = pd.read_sql('select * from table_name where ...', engine)

PS you may also want to make use of the chunksize parameter...

Memory Error while creating Data frame from list

Answers (1)

Related Questions