Reputation: 37
I need help with Powercenter. I have a flat file that I use that in some columns has text such as "Los Ángeles", which I would need to change to "Los Angeles", else the "Á" is shown as a null character, ending with "Los NULLngeles" or something like that.
Some additional information:
The source file is a flatfile .CSV separated with ";" that has, in file format, code page: 7-bit ASCII
The target file is a flatfile .TXT, with columns separator pipe "|". It has in file format, code page: MS Windows Latin 1 (ANSI), superset of Latin 1
I tried using REPLACECHR in the following code:
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(1,
REPLACECHR(0, clv, '|', ''),'á','a'),'é','e'),'í','i'),'ó','o'),'ú','u'),'Á','A'),'É','E'),'Í','I'),'Ó','O'),'Ú','U'),'ñ','n'),'Ñ','Ñ')
However this did not work. How can I make it so it does what I want it to do?
As an addon, the first REPLACECHR is to remove any pipes so the upload to a SQL server doesn't crash.
Upvotes: 1
Views: 3467
Reputation: 358
I don't have enough reputation to leave this as a comment otherwise I would've, but I'll pitch how I would solve this.
In informatica when you select data viewer and run the data viewer on your source can you see Los Angeles appearing as "Los Ángeles" or do you see it as "Los NULLngeles"?
if it's the latter this solution will not work as you'll need to sanitize the data in the source prior to loading it into informatica (can be done easily using python and regex)
how I would tackle this problem is using a java trasnformation
to iterate over your strings seeing if they contain any special characters and if they do then replace them at that index. The pseudo code will look like this:
for(i = 0; i < myString.length(); i++){
if(myString[i] == 'á' || myString[i] == 'Á')
myString[i] = 'a';
}
Upvotes: 2
Reputation: 11
You can try by setting the code page to UTF-8 encoding at the session level,file properties setting. And also check the code page of Flat file which you are trying to load,it should UTF-8. Hope this answer help you.
Upvotes: 0
Reputation: 310
This is a data quality issue which has to be taken back to the source team. It cannot be found using REPLACE_CHR as it is not an ASCII character.
As far as the mapping goes, the non-ascii characters like this can be removed using REG_REPLACE. Example: REG_REPLACE(PRODUCT_DESC.'[^[:print]]','')
Other ways of doing it: https://kb.informatica.com/howto/6/Pages/21/535269.aspx?
Upvotes: 0