StatguyUser
StatguyUser

Reputation: 2665

Clean accented character and white space in column in Talend

I have a workflow as follows. In the column 'summary', i want to remove

  1. question mark(?)
  2. white space from the text
  3. replace accented alphabets with the english equivalent. For example é into e.

enter image description here

Thanks in advance!!

Upvotes: 0

Views: 2189

Answers (2)

Corentin
Corentin

Reputation: 2552

Removing question mark(?)

In your tMap, use StringHandling.EREPLACE(row.yourString,"?","")

white space from the text

In your tMap, use StringHandling.TRIM("row.yourString")

replace accented alphabets with the english equivalent. For example é into e.

In your tMap, use TalendString.removeAccents(row.yourString)

You don't have to import additionnal librairies with TalendString class already implemented.

Basically all these functions (and much more) are accessible through the Expression Builder in tMap.

Upvotes: 1

TRF
TRF

Reputation: 801

see my answer on Talend community forum

enter image description here 1st, load the commons-lang3-3.4.jar file and import org.apache.commons.lang3.StringUtils. For that, in tLibraryLoad Basic settings select "commons-lang3-3.4.jar", then in Advanced setting enter import "org.apache.commons.lang3.StringUtils;" in the import field.
In tJavaRow, enter the following (maybe something similar in tMap depending on your use case):
output_row.line = StringUtils.stripAccents(input_row.line);
tFixedFlowInput is here to generate data for the flow ("aaaéééàààçççbbbb" for my example), and the result is:
aaaeeeaaacccbbbb

Hope this helps,
TRF

Upvotes: 0

Related Questions