Reputation: 21
complete noob here to Talend/Data Integration in general. Have done simple things like loading a CSV to Oracle table using Talend. Below is the requirement now and looking for some ideas to get started please
Request: Have a folder in Unix Environment where the source application is pushing out .csv files daily@5AM. They are named as below
Filename_20200301.csv Filename_20200302.csv Filename_20200303.csv . . and so on till current day.
I have to create a Talend Job to parse through these csv files every morning and load them into an oracle table where my BI/reporting team can consume the data. This table will be used as a Lookup table, and the source is making sure not to send duplicate records in csv. The files would usually have about 250-300 rows per day. Plan is to keep an eye and if volume of rows increase in future then maybe think of limiting the time frame of the date to rolling 12 months. Currently i have files from March 1st, 2020 onwards to today. The destination Oracle schema/table is always the same.
Tools: Talend Data Fabric 7.1
I can think of the below steps but no idea how to get started on step1) and step2) 1) Connect to a Unix server/shared location . I have the server details/Username/Password but what component to use in Metadata? 2) Parse through the files on the above location. Should i use TfileList? Where does the TFileInputDelimited come in? 3) Maybe use Tmap for some cleanup/changing datatypes before using TDBOutput to push into oracle. I have used these components in the past , just have to figure out how to insert into oracle table instead of truncate/load.
Any thoughts/other cool ways to doing it please. Am i going down the right path?
Upvotes: 0
Views: 1621
Reputation: 5141
Please use below flow,
tFTPFileList --> tFileInputDelimited --> tMap --> tOracleOutput
If you are not picking the file from local server, please use tFileList instead of tFTPFileList
Upvotes: 0
Reputation: 562
For Step 1, you can use the tFTPGet which will save your files from the Unix server/shared location to your local machine or job server.
Then for Step 2, as what you mentioned, you can use a combination of tFileList and tFileInputDelimited
Hope this helps.
Upvotes: 1