Reputation: 313
I'm having problems storing a large amount of data using HDF5 functionality in Pandas.
What I'm trying to do is download a huge amount of data (Millions of rows) from a SQL sever and have it stored locally for easy and quick access.
I defined some functions to create a connection to the SQL server and run the queries. The code works fine until it tries to store the data. I section the data in different datasets and .h5 files. Some of the datasets are stored correctly, but I think the bigger ones are causing problems and I receive the following error.
HDF5ExtError: Problems creating the Array pandas
Is there a way to fix this?
This is the code I'm using:
import pymssql
import pandas as pd
import time
user = 'xxx'
password = '123'
server='SQL_Server'
def connect():
"""
Connects to SQL database and return a connection object.
"""
connection = pymssql.connect(host=server, user=user, password=password, database=user+"_db")
return connection
def query(query, connection, index = None):
"""
Execute an SQL query and return the result as a data frame.
"""
df = pd.read_sql_query(query, connection, index_col=index)
return df
def store_data(connection):
for i in reversed(range(0,8)):
st = 'store' + str(i) + '.h5'
store = pd.HDFStore(st)
for j in reversed(range(i*8,i*8 + 8)):
q = "Some query here"
df = query(q, connection)
name = 'snap' + str(j)
print 'Storing data for ' + name + " in " + st
store[name] = df
time.sleep(30)
print '[DONE] - Closing ' + st
store.close()
Upvotes: 1
Views: 2564
Reputation: 2248
This error could be caused by running out of space on the drive you are trying to save to - check the free space on your target drive.
Upvotes: 4