nofel
nofel

Reputation: 1

Teradata TPT Load Script Performance

I hope someone can help me with the improvement for Teradata TPT load Script. I am using the below script to load a 3GB delimited CSV file into Teradata. The file resides on my local laptop hard drive. The file takes approximately 30 minutes to load which is quite long. The total number of rows to be loaded are approximately 30 million. Any recommendations on performance improvement.

DEFINE JOB LOAD_TD_FROM_CSV
DESCRIPTION 'Load Teradata table from CSV File'
(
  DEFINE SCHEMA FOOD_TPT   /*Define Table schema*/
  DESCRIPTION 'FOOD_TPT'
  (
                           col1 VARCHAR(20),
                           col2 VARCHAR(100),
                           col3 VARCHAR(100),
                           col4 VARCHAR(100),
                           col5 VARCHAR(100),
                           col6 VARCHAR(100),
                           col7 VARCHAR(100),
                           col8 VARCHAR(100)
                                  
  );


DEFINE OPERATOR DDL_OPERATOR  
TYPE DDL
ATTRIBUTES
(
 VARCHAR TdpId           = 'system', /*System Name*/
 VARCHAR UserName        = 'user',    /*USERNAME*/
 VARCHAR UserPassword    = 'password', /*Password*/ 
 VARCHAR Errorlist             ='3807' /*This is added to skip 'Table does not exist error' and treat it as warnig*/
 );


DEFINE OPERATOR LOAD_CSV    /*Load information*/
DESCRIPTION 'Operator to Load CSV Data'
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
  VARCHAR PrivateLogName,
  VARCHAR TraceLevel      = 'None',
  INTEGER TenacityHours   = 1,
  INTEGER TenacitySleep   = 1,
  INTEGER MaxSessions     = 4,
  INTEGER MinSessions     = 1,
  INTEGER BUFFERSIZE      =16,
  VARCHAR TargetTable     = 'FOOD_TPT_STG',   /*Define target table name where the file will be loaded*/
  VARCHAR LogTable        = 'FOOD_TPT_LOG',     /*Define Log table name*/
  VARCHAR ErrorTable1     = 'FOOD_TPT_STG_E1',/*There are 2 error tables. Define them. First table is _ET table*/
  VARCHAR ErrorTable2     = 'FOOD_TPT_STG_E2', /*Define _UV table*/
  VARCHAR TdpId           = 'system',  /*System Name*/
  VARCHAR UserName        = 'user',     /*Username*/
  VARCHAR UserPassword    = 'password'              /*Password*/

);

DEFINE OPERATOR READ_CSV
DESCRIPTION 'Operator to Read CSV File'
TYPE DATACONNECTOR PRODUCER
SCHEMA FOOD_TPT
ATTRIBUTES
(
  VARCHAR  Filename            =   'file.csv'  /*give file name with path*/
 ,VARCHAR Format               =   'Delimited'
 ,VARCHAR TextDelimiter        =   ','
 ,VARCHAR AcceptExcessColumns  =   'N'
 ,VARCHAR PrivateLogName       =   'LOAD_FROM_CSV'
 ,Integer SkipRows=1  /*skips the header in csv file*/
);


Step Setup_Tables  /*Enter all executable SQLs in this step*/
(
   APPLY
     ('Drop table FOOD_TPT_STG_E1;'), /*Drop error tables*/
     ('Drop table FOOD_TPT_STG_E2;'),
     ('Drop table FOOD_TPT_LOG;'),  /*Drop Log Table*/
     ('Drop table FOOD_TPT_STG;'), /*Drop Target staging tables*/
     ('CREATE  TABLE FOOD_TPT_STG ,NO FALLBACK ,
              NO BEFORE JOURNAL,
              NO AFTER JOURNAL,
              CHECKSUM = DEFAULT,
              datablocksize= 1022 kbytes,
              DEFAULT MERGEBLOCKRATIO
                  (
                    col1 VARCHAR(20),
                    col2 VARCHAR(100),
                    col3 VARCHAR(100),
                    col4 VARCHAR(100),
                    col5 VARCHAR(100),
                    col6 VARCHAR(100),
                    col7 VARCHAR(100),
                    col8 VARCHAR(100)
                )
                                         
       NO PRIMARY INDEX;')                                                                                                                                            /*Create Target table*/
TO OPERATOR (DDL_OPERATOR);
);

Step Load_Table
(
   APPLY ('INSERT INTO FOOD_RISK_TPT_STG
           (
           :col1
           ,:col2 
           ,:col3 
           ,:col4 
           ,:col5 
           ,:col6 
           ,:col7 
           ,:col8 
            );') /*Inserts records from CSV file into Target Table*/
TO OPERATOR (LOAD_CSV)

SELECT * FROM  operator(READ_CSV);
   
);

);

Thanks in advance

Upvotes: 0

Views: 1551

Answers (1)

dnoeth
dnoeth

Reputation: 60482

As Fred wrote, you specify a BUFFERSIZE of 16KB -> 22 rows per block (FastLoad calculates this based on the defined max size) which results in 1.6 million messages send. Remove the attribute and you get 1 MB as default, 1400 rows per block. Additionally you might simplify your scripts like this:

DEFINE JOB LOAD_TD_FROM_CSV
DESCRIPTION 'Load Teradata table from CSV File'
(

Step Setup_Tables  /*Enter all executable SQLs in this step*/
(
   APPLY
     ('Drop table FOOD_TPT_STG_E1;'), /*Drop error tables*/
     ('Drop table FOOD_TPT_STG_E2;'),
     ('Drop table FOOD_TPT_LOG;'),  /*Drop Log Table*/
     ('Drop table FOOD_TPT_STG;'), /*Drop Target staging tables*/
     ('CREATE  TABLE FOOD_TPT_STG ,NO FALLBACK ,
              NO BEFORE JOURNAL,
              NO AFTER JOURNAL,
              CHECKSUM = DEFAULT,
              datablocksize= 1022 kbytes,
              DEFAULT MERGEBLOCKRATIO
                  (
                    col1 VARCHAR(20),
                    col2 VARCHAR(100),
                    col3 VARCHAR(100),
                    col4 VARCHAR(100),
                    col5 VARCHAR(100),
                    col6 VARCHAR(100),
                    col7 VARCHAR(100),
                    col8 VARCHAR(100)
                )
                                         
       NO PRIMARY INDEX;') /*Create Target table*/
TO OPERATOR ($DDL
  (
    TdpId           = 'system', /*System Name*/
    UserName        = 'user',    /*USERNAME*/
    UserPassword    = 'password', /*Password*/ 
    Errorlist             ='3807' /*This is added to skip 'Table does not exist error' and treat it as warnig*/
  )
);

Step Load_Table
(
   APPLY ($INSERT 'FOOD_RISK_TPT_STG') /*Inserts records from CSV file into Target Table*/
   TO OPERATOR ($LOAD(
  /*     BUFFERSIZE      = 16384,  Default is 1 MB, increasing it further to the max 16MB might improve a bit */
      TargetTable     = 'FOOD_TPT_STG',   /*Define target table name where the file will be loaded*/
      LogTable        = 'FOOD_TPT_LOG',     /*Define Log table name*/
      ErrorTable1     = 'FOOD_TPT_STG_E1',/*There are 2 error tables. Define them. First table is _ET table*/
      ErrorTable2     = 'FOOD_TPT_STG_E2', /*Define _UV table*/
      TdpId           = 'system',  /*System Name*/
      UserName        = 'user',     /*Username*/
      UserPassword    = 'password'              /*Password*/
     )
   )
SELECT * FROM  operator($FILE_READER
(
    Filename             =   'file.csv'  /*give file name with path*/
   ,Format               =   'Delimited'
   ,TextDelimiter        =   ','
   ,AcceptExcessColumns  =   'N'
   ,PrivateLogName       =   'LOAD_FROM_CSV'
   ,SkipRows=1  /*skips the header in csv file*/
));

);

And there are job variables files to make scripts better reusable

Upvotes: 1

Related Questions