Reputation: 1542
I have continuous sensor data coming in every 5 mins in form of files. I want to pick files only for the past hour and do the required processing. for e.g: the talend job runs at 12:01pm , it picks all the files from 11:00 am to 12:00 pm only.
Can anyone please suggest the approach I should take to make this happen within talend. is there any inbuilt component that can pick files for previous one hour ?
Upvotes: 0
Views: 1950
Reputation: 451
Use tFileProperties, in which you will get builtin schema with the name of mstring_name. By using this column you will get last modified time of file and in tJava or tJavaRow you can check wether this time lie between past one hour using talendDate functions
iterate all files and in tJavaRow write this code :
Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", input_row.mtime_string);
Date current_date = TalendDate.getCurrentDate();
if(TalendDate.diffDate(current_date, lastModifiedDate,"HH") <= 1) {
output_row.abs_path = input_row.abs_path;
}
by this you will get all the files which are between past one hour.
hope this helps..
here is the complete job design :
tFileList--->(iterate)---->tFileProperties---->(row1 main)---->tJavaRow---->if---->tFileInputDelimited---->main----->tMap---->main----->tFileOutput
The context you are setting tJavaRow, check its nullability in if condition :
context.getProperty("file") != null && !context.getProperty("file").isEmpty()
After this use the context as you are doing
Upvotes: 2
Reputation: 186
There is no built-in component that will give you files based on time.
However, you can accomplish this by using tFileList-->tFileProperties. Configure tFileList to sort by last modified date, then tFileProperties will give you the modified date. From there, you can filter based on the date value - if older than an hour, stop, otherwise process.
Upvotes: 0