Romain Ferraton
Romain Ferraton

Reputation: 71

Export HANA SQL query or table in CSV

I use a HANA 2.0 database. I want to export a table or a sql query from the database to an external client as fast as possible and using a command line (i'm on windows)

I already tested differents cli methods with success but not as fast as a gui method (using HANA Studio Export)

The CSV generated is around 400MB, in all cases it is exactly the same data.

I successfully exported the result of a SQL query using hdbsql and -o option

$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"

$CSVDelimiter = '¤'  ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"

Measure-Command{hdbsql -U $HanaUserKey -o .\${TargetFile} -I D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL -V period=$periodstr -F $CSVDelimiter -x -resultencoding $encoding}

With hdbsql it took 65s to export Network BP used ~40Mbs:

hdbsql export time result

I successfully exported using ODBC (HDBODBC)

$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "@MAESTRO_VAL"

$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('&period',$periodstr)

$CSVDelimiter = 31  ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"

$conn = new-object System.Data.Odbc.OdbcConnection
$conn.connectionstring = "Driver={HDBODBC};SERVERNODE=${HanaUserKey};PACKETSIZE=2097152; PREFETCH=TRUE"
$conn.Open()
$cmd = New-object System.Data.Odbc.OdbcCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object System.Data.Odbc.OdbcDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"

#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}

$conn.Close()

it took 57s to export data to a dataset and 1.2s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs

HDBODBC export time result

We also successfully export using python (sqlalchemy + hdbcli) and pandas dataframes in about 35s (Network BP ~90 Mbs). We tried to change some parameters for ODBC/hdbcli like PACKETSIZE=2MB or PREFETCH=TRUE, without improving the cli extraction

And finally I successfully exported the same data that i had previously writen to a table (the create as select took 6s) using HANA Studio

Export using SAP HANA Studio Step 1 Export using SAP HANA Studio Step 2

it took 16s to received the data : more than 3x better !! Network BP used 500Mbs !

Network BP at 500Mbs during SAP HANA Studio Export to client

To summary results we currently have to export the same data :

Type Method Elapsed (sec)
CLI hdbsql 65
CLI HBDODBC+powershell dataset 58
CLI Python using pandas + sqlalchemy(hdbcli) 35
GUI SAP HANA Studio 16

The final question is : does someone know how to export data to an external client as fast as GUI SAP HANA Studio but in command line ?

Upvotes: 0

Views: 8540

Answers (2)

Romain Ferraton
Romain Ferraton

Reputation: 71

You were in the right way @Mathias Kemeter.

I asked the same question on the answers.sap.com forum and Jeff Albion give me a similar (but slightly different) answer : HANA Studio use a remote EXPORT before retrieving the data.

It was not EXPORT INTO, which is a special command that allow to export to a file on the HANA machine with the formatting you want (delimiter, quotes, row separator...) but EXPORT.

EXPORT is a different command that allow you to export to a file (like EXPORT INTO) but also to a temporary table (which is not documented...or i miss something).

With EXPORT you can only choose BINARY ou CSV output and you cannot choose the CSV format. Another difference between the two commands is that EXPORT INTO need special priviledges whereas EXPORT need only a GRANT SELECT on the source.

EXPORT will dump the source table in a temporary table table which will contains among other thing, a "cell" with all the data in a BLOB or CLOB column.

It was something I didn't know, so big thanks to Jeff Albion again !

So I wrote a complete article about How to Export HANA Data to CSV fast

As @Lars Br. ask :

How long does it take to fill the table you used for exporting by HANA Studio?

EXPORT will export data of a table, so if you have a query to export (like me in the OTIF4HDBSQL.SQL ) you will need to store the result of the query in a standard table the time of the export. So in my case it take 6s to export the query to a standard table before EXPORT the data.

I propose a solution using ADO + Powershell + Remote Export :

$hanaado = [Reflection.Assembly]::LoadFile("D:\Applications\sap\hdbclient\ado.net\v4.5\Sap.Data.Hana.v4.5.dll")

$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MYUSERVH3"

# ToDo : Use a CREATE AS SELECT Statement to store intermediate data because EXPORT statement only work with a standard table

$SQLQueryCTT = [IO.File]::ReadAllText("D:\applications\sql\export\TestExport-CTT.sql")  # CREATE TEMPORARY TABLE
$SQLQueryDTT = [IO.File]::ReadAllText("D:\applications\sql\export\TestExport-DTT.sql")  # DROP TEMPORARY TABLE
$SQLQueryExport = [IO.File]::ReadAllText("D:\applications\sql\export\TestExport-Export.sql") # REMOTE EXPORT AS CSV (build a one row, "one" usefull nclob column table)
$SQLQuerySelect = [IO.File]::ReadAllText("D:\applications\sql\export\TestExport-SelectBlob.sql") # SELECT FROM THE Temporary Table

#$CSVDelimiter = 31  ## US : User Separator ascii 31
#$UserQuotes = "Never"
$encoding = "UTF8"
$conn = New-Object Sap.Data.Hana.HanaConnection
$conn.connectionstring = "KEY=${HanaUserKey}"

try {
    $conn.Open()    
    try {  # DROP Temp Table IF EXISTS
            $cmddtt = New-object sap.data.Hana.HanaCommand($SQLQueryDTT,$conn) 
            write-host "****************************"
            write-host $SQLQueryDTT
            write-host "****************************"
            $ncmddtt=$cmddtt.ExecuteNonQuery()
            Write-Output "Try Drop Temporary Table Succeeded"
    }
    catch{}

    #CREATE Temp Table
    $cmdctt = New-object sap.data.Hana.HanaCommand($SQLQueryCTT,$conn) 
    write-host "****************************"
    write-host $SQLQueryCTT
    write-host "****************************"
    $ncmdctt=$cmdctt.ExecuteNonQuery();
    Write-Output "Create Temporary Table Succeeded"

    # Remote EXPORT as CSV
    $cmdexport = New-object sap.data.Hana.HanaCommand($SQLQueryExport,$conn) 
    write-host "****************************"
    write-host $SQLQueryExport
    write-host "****************************"
    Measure-Command{$nexport=$cmdexport.ExecuteNonQuery();}
    write-host "Remote Export Succeeded"

    
    # Retrieve DATA as one row x one CLOB column and write the result to a file
    $cmdselect = New-object sap.data.Hana.HanaCommand($SQLQuerySelect,$conn)
    write-host "****************************"
    write-host $SQLQuerySelect
    write-host "****************************"
    $dr = $cmdselect.ExecuteReader()
    write-host "****************************"
    write-host "Retrieve and Write CSV CLOB Content to File ${TargetDir}\${TargetFile}"
    write-host "****************************"
    rm ${TargetDir}\${TargetFile}
    
    Measure-Command{        
        #while ($dr.Read()) # retrieve the data
        #{
        $dr.Read()
         Out-File -InputObject $dr["CONTENT"] -FilePath ${TargetDir}\${TargetFile} -Encoding $encoding # write the data to a file
        #}
    }
    
    $dr.Close()

    $cmddtt = New-object sap.data.Hana.HanaCommand($SQLQueryDTT,$conn) 
    write-host "****************************"
    write-host $SQLQueryDTT
    write-host "****************************"
    $ncmddtt=$cmddtt.ExecuteNonQuery()
    Write-Output "Try Drop Temporary Table Succeeded"

    $conn.Close()
} 
catch { 
    Write-Output $_.Exception.Message
}
finally
{
    $conn.Dispose();
}

The result screen : HANA Export to CSV using ADO + Powershell + Remote EXPORT

The content of the SQL files :

TestExport-CTT.sql

CREATE LOCAL TEMPORARY TABLE #OTIF_EXPORT_202012 (FILENAME VARCHAR(64), PATH VARCHAR(255), CONTENT CLOB);

TestExport-Export.sql

EXPORT "T_OTIF_EXPORT_20212" AS CSV INTO #OTIF_EXPORT_202012 WITH NO DEPENDENCIES NO STATISTICS THREADS 1;

TestExport-SelectBlob.sql

SELECT "CONTENT" FROM #OTIF_EXPORT_202012 where filename='data.csv';

Upvotes: 2

Mathias Kemeter
Mathias Kemeter

Reputation: 1183

If I understood correctly, all your approaches are first transferring the result set to the client before creating the CSV. Possibly, HANA Studio is internally using the EXPORT INTO statement to generate the CSV on the server first.

Anyway, I would give it a shot by using EXPORT INTO in SQL and comparing runtimes. You can issue the statement from the command line with HDBSQL.

Upvotes: 1

Related Questions