kab
kab

Reputation: 195

Inserting data to redshift using python

I'm trying to insert multiple rows into amazon redshift database , the rows included in a list of tuples which looks like this:

my_rows=[(1, 0.0, 0, 0.0, 2010188534, 1816780086, 1113834, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 1, 0.0, 2010188536, 1816780086, 1119396, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 2, 0.0, 2010188538, 1816780086, 1119398, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 3, 0.0, 2010188540, 1816780086, 1123612, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 0, 0.0, 2010188542, 1816780102, 1086852, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 1, 0.0, 2010188544, 1816780102, 1087014, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 2, 0.0, 2010188546, 1816780102, 1089224, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 3, 0.0, 2010188548, 1816780102, 1089348, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 4, 0.0, 2010188550, 1816780102, 1122564, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17')]

Some columns may contain None

I'm inserting them row by row into Redshift database this way:

    cur = con.cursor()
    columns_names=("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10")
    insert_reference=len(my_rows[0])*"%s,"
    values_references="("+insert_reference[0:-1]+")"
    for row in my_rows:
      cur = con.cursor()
      insert_query="INSERT INTO "+table+" "+columns_names+" VALUES "+values_references+";"
      cur.execute(insert_query, row)

The problem is that when I run this code, it blocks on the first row without raising any error. So, my questions are : Is it normal that it takes so much time to insert one row ? If not is there some error in my code ? Is there some efficient way to that ?

Can i get some help please ? Thank you in advance

Upvotes: 0

Views: 4571

Answers (1)

Jon Scott
Jon Scott

Reputation: 4354

The process you should follow:

  1. write your data in csv format to an s3 folder, ideally gzipped
  2. run a redshift copy command to import that data into a temporary table in redshift
  3. run redshift sql to insert that data into your table

That will run fast, is the correct & recommended way and will be scaleable.

Upvotes: 1

Related Questions