Reputation: 37
I have a sample transformation setup for the purpose of this question:
Table Input step -> Table output step.
When running the transformation and looking at the live stats I see this:
The table output step loads ~11 rows per second which is extremely slow. My commit size in the Table Output step is set to 1000. The SQL input is returning 40k rows and returns in 10 seconds when run by itself without pointing to the table output. The input and output tables are located in the same database.
System Info: pdi 8.0.0.0 Windows 10 SQL Server 2017
Upvotes: 1
Views: 1963
Reputation: 1145
Table output is in general very slow.
If I'm not entirely mistaken, it does an insert for each incoming row, which takes a lot of time.
A much faster approach is using 'bulk load' which streams data from inside Kettle to a named pipe using "LOAD DATA INFILE 'FIFO File' INTO TABLE ....".
You can read more about how the bulk loading is working here: https://dev.mysql.com/doc/refman/8.0/en/load-data.html
Anyways: If you are doing input from a table to another table, in the same database, then I would have created an 'Execute SQL script'-step and do the update with a single query.
If you take a look at this post, you can learn more about updating a table from another table in a single SQL-query: SQL update from one Table to another based on a ID match
Upvotes: 2