Reputation: 274
LOAD is a DB2 utility that I would like to use to insert data into a table from a CSV file. How can I do this in Python using the ibm_db
driver? I don't see anything in the docs here
CMD: LOAD FROM xyz OF del INSERT INTO FOOBAR
Running this as standard SQL fails as expected:
Transaction couldn't be completed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0104N An unexpected token "LOAD FROM xyz OF del" was found following "BEGIN-OF-STATEMENT". Expected tokens may include: "<space>". SQLSTATE=42601 SQLCODE=-104
Using the db2 CLP directly (i.e. os.system('db2 -f /path/to/script.file')
) is not an option as DB2 sits on a different machine that I don't have SSH access to.
EDIT:
Using the ADMIN_CMD
utility also doesn't work because the file being loaded cannot be put on the database server due to firewall. For now, I've switched to using INSERT
Upvotes: 1
Views: 5097
Reputation: 8663
Briefly: One solution is to use an SQLAlchemy adapter and Db2’s External Tables.
The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.
Where above, an Engine references both a Dialect and a Pool, which together interpret the DBAPI’s module functions as well as the behavior of the database.
Creating an engine is just a matter of issuing a single call, create_engine():
dialect+driver://username:password@host:port/database
Where dialect is a database name such as mysql, oracle, postgresql, etc., and driver the name of a DBAPI, such as psycopg2, pyodbc, cx_oracle, etc.
Transient external tables (TETs) provide a way to define an external table that exists only for the duration of a single query.
TETs have the same capabilities and limitations as normal external tables. A special feature of a TET is that you do not need to define the table schema when you use the TET to load data into a table or when you create the TET as the target of a SELECT statement.
Following is the syntax for a TET:
INSERT INTO <table> SELECT <column_list | *>
FROM EXTERNAL 'filename' [(table_schema_definition)]
[USING (external_table_options)];
CREATE EXTERNAL TABLE 'filename' [USING (external_table_options)]
AS select_statement;
SELECT <column_list | *> FROM EXTERNAL 'filename' (table_schema_definition)
[USING (external_table_options)];
For information about the values that you can specify for the external_table_options
variable, see External table options.
General example
INSERT INTO EMPLOYEE SELECT * FROM external '/tmp/employee.dat' USING (delimiter ',' MAXERRORS 10 SOCKETBUFSIZE 30000 REMOTESOURCE 'JDBC' LOGDIR '/logs' )
pip install ibm-db
pip install SQLAlchemy
One example below shows how it works together.
from sqlalchemy import create_engine
usr = "enter_username"
pwd = "enter_password"
hst = "enter_host"
prt = "enter_port"
db = "enter_db_name"
#SQL Alchemy URL
conn_params = "db2+ibm_db://{0}:{1}@{2}:{3}/{4}".format(usr, pwd, hst, prt, db)
shema = "enter_name_restore_shema"
table = "enter_name_restore_table"
destination = "/path/to/csv/file_name.csv"
try:
print("Connecting to DB...")
engine = create_engine(conn_params)
engine.connect() # optional, output: DB2/linux...
print("Successfully Connected!")
except Exception as e:
print("Unable to connect to the server.")
print(str(e))
external = """INSERT INTO {0}.{1} SELECT * FROM EXTERNAL '{2}' USING (CCSID 1208 DELIMITER ',' REMOTESOURCE LZ4 NOLOG TRUE )""".format(
shema, table, destination
)
try:
print("Restoring data to the server...")
engine.execute(external)
print("Data restored successfully.")
except Exception as e:
print("Unable to restore.")
print(str(e))
Upvotes: 1
Reputation: 17118
You could CALL
the ADMIN_CMD
procedure. ADMIN_CMD has support for both LOAD
and IMPORT
. Note that both commands require the loaded/imported file to be on the database server.
The example is taken from the DB2 Knowledge Center:
CALL SYSPROC.ADMIN_CMD('load from staff.del of del replace
keepdictionary into SAMPLE.STAFF statistics use profile
data buffer 8')
Upvotes: 1
Reputation:
LOAD
is an IBM command line processor command, not an SQL command. Is such, it isn't available through the ibm_db
module.
The most typical way to do this would be to load the CSV data into Python (either all the rows or in batches if it is too large for memory) then use a bulk insert to insert many rows at once into the database.
To perform a bulk insert you can use the execute_many
method.
Upvotes: 2