Reputation: 75
I have a SQL file with multiple SQL statements and I need to read them from a text file using Kettle / Pentaho PDI 6.1.0.
All the statements are separated using a semicolon, however each statement may be spanned across multiple lines:
CREATE TABLE Staging01 AS
SELECT ....
WHERE ...;
UPDATE Staging01
SET ....
WHERE ...;
I need to parse the file, taking each sql statement as a one single string.
I tried to use the Text Input File step but it always reads data line by line.
Can you help?
Thank you.
Upvotes: 1
Views: 2396
Reputation: 1196
Text-File-Input is ok, just read every line as a single field e.g. select FIXED mode and a sufficiently large field size to create a field named line
. Don't forget to enable trimming on the right side.
Given your sample input, the idea is to drop empty lines (it's a default setting for TFI) and to concatenate lines that form a statement. We know that the last such line in a group is ending with a semicolon.
Let's create a sequence _seq
using the User-Defined-Java-Expression
line.endsWith(";") ? 1 : 0
Field _seq
is one-off for our purposes, so we use Analytic-Query to peek ahead one row, resulting in a field seq
. If-Field-Value-Is-Null will turn that useless null value into a 0 for us.
Now let's use Group-By to calculate a series group
as a cummulative sum of
seq
.
Eventually we can use Group-By and our group
field to concatenate line
with a space character.
This is one dataflow solution. You sure could script a simple FSM, but why would you use Kettle, then?
Upvotes: 1
Reputation: 21
I think you should use "Row denormalizer" after adding a column to group all the rows of a SQL instruction (this new field will change when ";" is encountered). To get also ";" in stream don't specify it as delimiter.
Upvotes: 0