Reputation: 8208
I have some files that I would like to consolidate into a single database table. The files have similar but different formats. The files look something like this:
FileOne:
FileTwo:
FileThree:
The destination table looks like this:
TableDestination:
I want to develop a mapping that ETLs these three files into this one database, but because the column names are different, it looks like I'll have to develop three different mappings, or three different sources, or three different somethings. The problem is that my example is contrived: I actually have many different files that all have different formats and column names, but the data is all very similar.
I would like to develop a single mapping or workflow that can handle all of this by only adding a table that holds the column mappings. Such a table would look like this based on the sample files and sample table above:
TableMappings:
In this way, to edit a column mapping I only have to make an edit this this TableMappings table. I wouldn't have to make any changes at all to the mapping or workflow. Nor would I have to redeploy an application.
What would a mapping or workflow look like that could take advantage of something like this? I assume there'd be a flat file source that takes files from a folder. There would be something in the middle that uses this TableMappings table to map column names. Finally there would be a relational data object that represents my destination database table "TableDestination". I don't know how to put this together though.
Upvotes: 1
Views: 3717
Reputation: 3353
This can also be done using expression. You'd need a "generic" Source Definition
(e.g. Column1
, Column2
, ..., ColumnN
) and in each port an expression that checks which port should be returned. E.g.
DECODE (SUBSTR(TargetColumnOrder,X,1),
'1', Column1,
'2', Column2,
...
'N', ColumnN)
with X being the port index.
The above assumes a bit different structure of the mappings table:
FileName | TargetColumnOrder
----------------------------
FileOne | 231
FileTwo | 527
Note 1: If there can be different number of columns, you need to check if Length(TargetColumnOrder)
is not less than port index, otherwise SUBSTRING will not work.
Note 2: The above solution has not been tested or even implemented. Please treat this as a general description rather than exact code base.
Upvotes: 0
Reputation: 3353
With Flat Files
as source, column names are not important. It doesn't even matter if the column count is matching. If the actual file will have more columns then Source Definition
, only the first n columns will be read (with n being the number of ports in Source Definition
). In the opposite situation, the extra ports will contain null values.
Having said that, loading multiple flat files is easy.
The problem would be if the column order is different and you want this additional static table that would define the column mapping. This is doable, i.e. Java Transformation
can be used to do the column mapping. But the whole solution is too complex for me to describe it here. I can try to answewr some precise, specific quesitons - but I'm not able to prepare and paste here the complete solution.
Upvotes: 1