Moving 5,000,000 rows to another Postgresql DBs by Clojure & JDBC

Question

I am trying to move 5,000,000 rows from one Postgre DB to another one. Both connections are in Hikari CP connection pool.

I went through a lot of documentation and posts. It left me with the code bellow. But it is not really usable:

(jdbc/with-db-connection [tx {:datasource source-db}]
  (jdbc/query tx
      [(jdbc/prepare-statement (jdbc/get-connection tx)
                                answer-sql
                                {:fetch-size 100000})]
                  {:result-set-fn (fn [result-set]
                                    (jdbc/insert-multi!
                                     {:datasource target-db}
                                     :migrated_answers
                                     result-set))}))

I've tried a lot of little different forms of this. jdbc/with-db-transaction or any other I can think of didn't help much.

A lot of tutorials and posts mention only the way how to process the result as a whole. It is absolutely ok with small tables that get in RAM but and it seems fast. But this is not the case.
So when I properly use :fetch-size and my RAM doesn't explode (hocus pocus) than the transfer IS very slow with both connections switching between 'active' and 'idle in transaction' states on DB sides. Ive never waited for so long to find any of the data actually transferred!

When I create this simple batch in Talend Open Studio (ETL tool generating Java code) it transfers all the data in 5 minutes. And the cursor-size is "also" set to 100000 there. I think that Clojure's clean code should be faster.

The fastest result that I've got was with this code below. I think it is because the :as-array parameter. If I don't use :max-rows parameter memory explodes because it is not processed lazily, so I can't use this for the whole transfet. Why? I don't understand the rules here.

(jdbc/with-db-transaction [tx {:datasource source-db}]
  (jdbc/query tx
              [(jdbc/prepare-statement (:connection tx)
                                        answer-sql
                                       {:result-type :forward-only
                                        :concurrency :read-only
                                        :fetch-size 2000
                                        :max-size 250000})]
              {:as-arrays? true
               :result-set-fn (fn [result-set]
                                (let [keys (first result-set)
                                      values (rest result-set)]
                                  (jdbc/insert-multi! 
                                     {:datasource dct-db}
                                      :dim_answers
                                       keys values)))}))

I will appreciate any help or info that I am clearly missing.

Akiz · Accepted Answer

This solution works best for me and it also seems faster than Taylor's solution. But huge thank for helping me.

It doesnt commit until the transaction is done. I have to experience any problems yet to see if I wont have to pimp it but i am happy for now. I've tried to replace first transaction with with-db-connection but it makes the records load straight into RAM.

(defn data->transfer2 [sql table]
     (jdbc/with-db-transaction [read-tx {:datasource dag-db}]
     (jdbc/with-db-transaction [tx {:datasource dct-db}]
        (jdbc/query read-tx
                  [(jdbc/prepare-statement (:connection read-tx)
                                           answer-sql
                                           {:result-type :forward-only
                                            :concurrency :read-only
                                            :fetch-size 100000})]
                  {:as-arrays? true
                   :result-set-fn (fn [result-set]
                                    (let [keys (first result-set)
                                          values (rest result-set)]
                                      (doseq [btch (partition-all 100000 values)]
                                        (jdbc/insert-multi! tx
                                                            :dim_answers
                                                             keys
                                                             btch))))})))

Moving 5,000,000 rows to another Postgresql DBs by Clojure & JDBC

Answers (2)

Related Questions

Moving 5,000,000 rows to another Postgresql DBs by Clojure &amp; JDBC

Answers (2)

Related Questions

Moving 5,000,000 rows to another Postgresql DBs by Clojure & JDBC