TalenD pull information from one column and input into another column

Question

I am using TalenD studio to merge about 80 log files into 1 giant file. The files are just standard txt files. I currently have a job set up to merge all the files together (they use the same headers and formatting), but my issue is the following.

The first column contains the users login id, if the user is running off of the server this is captured in the log, if they are running locally it is not. What I need to do is when the login id is Null/Blank, to find the login from the file path that is located in column 4.

The path is set up as eitehr C:\Documents and Settings(login id here).... or C:\Users(login id here).... or C:\DOCUME~1(login id here)... So it is always in between the 2nd set of backslashes. However, I am new to TalenD and am not sure what to put in the expression to pull this data out and put it in the login id field.

If anyone has a way of doing this, or can lead me in the right direction it would be very helpful!

ydaetskcoR · Accepted Answer

You can use a tExtractRegexFields component to extract the login ID from your filepath and then conditionally map this to the login ID column in a tMap if the login ID field is null or blank.

A typical job to do this might look like:

Typical job set up

This has an input of data (in this case a tFixedFlowInput component to hard code the values to the job), a tExtractRegexFields component to extract the login ID from the filepath column and then a tMap component to map the data conditionally.

Log data

The values in the above tFixedFlowInput component have a combination of the instances you see in your log and also show a login Id that is different to the one in your filepath so you can see that you won't always overwrite your login Id and only use the one in the filepath where necessary.

After this we need to configure the tExtractRegexField to look into the filepath column and attempt to find capture groups. I used the regex "^C:\\(?:Users|DOCUME~1|Documents and Settings)\\(\w+)\\" which will capture any "word-like" characters up until a back slash occurs. You may have to tweak this to get the right results for your users. The schema for the tExtractRegexField component also requires you to add extra columns for each capture group (which is also why I made the alternating group a passive group) and it will fill these sequentially. So if you have 3 capture groups in your regex but only 2 extra fields then only the first 2 capture groups will be used.

Finally, we use some simple logic in our tMap component to use the extracted login ID where necessary:

tMap configuration

Here we define a boolean variable that tests whether the login field is null or blank and if so we use the previously defined regexLogin value, otherwise we use the original login value.

And here's the result:

Output results

Notice how we successfully grab the user Ids from the 3 null or blank user Id entries and also how we defer to the original login Id when there's a clash between the login Id and the one we extracted from the filepath.

TalenD pull information from one column and input into another column

Answers (2)

Related Questions