How do I avoid memory error in EC2 when loading a huge table in Pandas dataframe?

Question

I tried to connect to redshift and load my huge fact table into pandas dataframe like below, and I always encounter memory error when I execute the script. I am thinking either the loading by chunk part is incorrect or I shouldn't load the whole fact table into dataframe at all. Can someone point me the right direction?

conn = psycopg2.connect(dbname='', user='', host='', port='',
                        password='')
df = pd.DataFrame()

for chunk in pd.read_sql(
        "select * from MyFactTable ",
        con=conn, chunksize=1000):
    df = df.append(chunk)

How do I avoid memory error in EC2 when loading a huge table in Pandas dataframe?

Answers (1)

Related Questions