Most effective way to push data from a SQL Server database into a Greenplum database?

Question

Greenplum Database version: PostgreSQL 8.2.15 (Greenplum Database 4.2.3.0 build 1)

SQL Server Database version: Microsoft SQL Server 2008 R2 (SP1)

Our current approach:

1) Export each table to a flat file from SQL Server

2) Load the data into Greenplum with pgAdmin III using PSQL Console's psql.exe utility

Benifits...

Speed: OK, but is there anything faster? We load millions of rows of data in minutes
Automation: OK, we call this utility from an SSIS package using a Shell script in VB

Pitfalls...

Reliability: ETL is dependent on the file server to hold the flat files
Security: Lots of potentially sensitive data on the file server
Error handling: It's a problem. psql.exe never raises an error that we can catch even if it does error out and loads no data or a partial file

What else we have tried...

.Net Providers\Odbc Data Provider: We have configured a System DSN using DataDirect 6.0 Greenplum Wire Protocol. Good performance for a DELETE. Dog awful slow for an INSERT.

For reference, this is the aforementioned VB script in SSIS...

Public Sub Main()

    Dim v_shell
    Dim v_psql As String


    v_psql = "C:\Program Files\pgAdmin III\1.10\psql.exe -d "MyGPDatabase" -h "MyGPHost" -p "5432" -U "MyServiceAccount" -f \MyFileLocation\SSIS_load\sql_files\load_MyTable.sql"

    v_shell = Shell(v_psql, AppWinStyle.NormalFocus, True)

End Sub

This is the contents of the "load_MyTable.sql" file...

\copy MyTable from '\MyFileLocation\SSIS_load	xt_files\MyTable.txt' with delimiter as ';' csv header quote as '"'

Most effective way to push data from a SQL Server database into a Greenplum database?

Answers (1)

Related Questions