Jemme
Jemme

Reputation: 313

Python Pandas storing error: (HDF5ExtError: Problems creating the Array)

I'm having problems storing a large amount of data using HDF5 functionality in Pandas.

What I'm trying to do is download a huge amount of data (Millions of rows) from a SQL sever and have it stored locally for easy and quick access.

I defined some functions to create a connection to the SQL server and run the queries. The code works fine until it tries to store the data. I section the data in different datasets and .h5 files. Some of the datasets are stored correctly, but I think the bigger ones are causing problems and I receive the following error.

HDF5ExtError: Problems creating the Array pandas

Is there a way to fix this?

This is the code I'm using:

import pymssql
import pandas as pd
import time

user = 'xxx'
password = '123'
server='SQL_Server'

def connect():
"""
Connects to SQL database and return a connection object.
"""
    connection = pymssql.connect(host=server, user=user, password=password, database=user+"_db")
    return connection

def query(query, connection, index = None):
"""
Execute an SQL query and return the result as a data frame.
"""
df =  pd.read_sql_query(query, connection, index_col=index)
return df

def store_data(connection):
for i in reversed(range(0,8)):
    st = 'store' + str(i) + '.h5'
    store = pd.HDFStore(st)
    for j in reversed(range(i*8,i*8 + 8)):
        q = "Some query here"
        df = query(q, connection)
        name = 'snap' + str(j)
        print 'Storing data for ' + name + " in " + st
        store[name] = df
        time.sleep(30)
    print '[DONE] - Closing ' + st
    store.close()

Upvotes: 1

Views: 2564

Answers (1)

tim654321
tim654321

Reputation: 2248

This error could be caused by running out of space on the drive you are trying to save to - check the free space on your target drive.

Upvotes: 4

Related Questions