Reputation: 136
I'm running a SQL query using cx_oracle in python, and the result of the query is a list. The dimension of list if 180 columns * 200,000+ rows. Whenever I try to convert it into a data frame using pd.DataFrame I run into a Memory Error. For now as a solution I've tried to breakdown my query by putting filters, or query only a few columns etc, which works out. But then if I change some filters I again run into the error, and I can't always be sure of how many rows a query will return.
So I'm looking for any alternative data structures/library/packages which can be used. Or anyway where I can handle this within Pandas? As I'm doing data analysis using Pandas, so I would prefer if there is a way to handle this in Pandas rather than another library.
The fields in the list is either float,string or timestamp format.
Upvotes: 2
Views: 442
Reputation: 210812
Try to read data directly into Pandas DataFrame:
import cx_Oracle # pip install cx_Oracle
from sqlalchemy import create_engine
engine = create_engine('oracle://user:password@host_or_scan_address:1521/ORACLE_SERVIVE_NAME')
df = pd.read_sql('select * from table_name where ...', engine)
PS you may also want to make use of the chunksize
parameter...
Upvotes: 2