Reputation: 4629
I have got a requirement to write a utility which will be removing some special characters from a given String input. I am unable to understand, how can i approach this task. I have been given a db procedure which does the same and i need to replicate the same algorithm in the java code. I am putting procedure here.
create or replace procedure dbimm.check_arabic_letters (name_a in out varchar2) as
pos number(3);
strlen number(3);
nxtchar char(1);
ascval number(3);
begin
replace_mult_spaces(name_a);
strlen := length(name_a);
pos := 1;
while pos <= strlen loop
nxtchar := substr(name_a, pos, 1);
ascval := ascii(nxtchar);
-- dbms_output.put_line(to_char(ascval));
if (ascval between 193 and 218) or
(ascval between 225 and 234) or
(ascval in (32,38,40,41,47,247, 248, 249, 250))
then
pos := pos + 1;
else
raise_application_error(-20000,display_message(9));
end if;
end loop;
name_a := replace(name_a, 'ي ','ى ');
if substr(name_a, strlen) = 'ي' then
name_a := substr(name_a, 1, strlen - 1) || 'ى';
end if;
name_a := replace(name_a, 'ة ', 'ه ');
if substr(name_a, strlen) = 'ة' then
name_a := substr(name_a, 1, strlen - 1) || 'ه';
end if;
/* Old code commented by Mobeen
name_a := replace(name_a, ' عبد ',' عبد');
if instr(name_a,'عبد ') = 1 and length(name_a) > 4 then
name_a := substr(name_a, 1, 3) || substr(name_a,5);
end if;
*/
-------
name_a := replace(name_a,'أ','ا');
name_a := replace(name_a,'إ','ا');
name_a := replace(name_a,'آ','ا');
--m name_a := replace(name_a,'لا','?');
name_a := replace(name_a,chr(250),'لا');
name_a := replace(name_a,chr(247),'لا');
name_a := replace(name_a,chr(248),'لا');
name_a := replace(name_a,chr(249),'لا');
name_a := replace(name_a,chr(63),'لا');
--- New Code added by Patrick
name_a := replace(name_a, ' عبد ال', ' عبدال');
if substr(name_a,1,6)= 'عبد ال' then --start
name_a:= 'عبدال'||substr(name_a,7);
end if;
----
name_a := replace(name_a, ' ابن ',' بن '); --middle
if substr(name_a,1,4)='ابن ' then --start
name_a:='بن '||substr(name_a,5);
end if;
if substr(name_a,-4)=' ابن' then --end
name_a:=substr(name_a,1,length(name_a)-4)||' بن';
end if;
-------
I started replicating the same somewhat like this in my java class.
public class ReplaceSpecialArabicCharacUtil {
/**
* This method is responsible for replacing special arabic
* Characters from the input given to the method. This method
* Algorithm is taken from the database procedure already been
* used for blacklist.
* @param nameInArabic name in Arabic of applicant. E.g First name, last name
* @return
*/
public static String removeSpecialArabicCharacters(String nameInArabic){
//Step-1 Remove multiple spaces. Take the procedure replica from Naveed
nameInArabic = nameInArabic.replaceAll(" ې" ,"ی ");
return nameInArabic;
}
/**
* Driver method responsible for testing the Algorithm.
* It is replicated from the Database Procedure.
* @param args
*/
public static void main(String[] args) throws UnsupportedEncodingException {
String s ="ې ";
// System.out.println(removeSpecialArabicCharacters(s).getBytes("UTF-8"));
}
}
replaceAll does not understand spaces. I am not sure, whether i am approaching the problem correct way. Can someone help me because i want to write this utility the correct way.
Thanks, Ben
Upvotes: 0
Views: 1093
Reputation: 2803
As best as I could, I have mimicked your procedure using Java code, except the replace_mult_space which I don't know what it does.
NOTE: when you copy paste you will definitely find compilation errors because my IDE, and also StackOverflow, don't really support arabic characters very well. So you will have to tweak the code yourself until you achieve your desired result.
Here's is the Java-equivalent of your procedure:
public class ReplaceSpecialArabicCharacUtil {
public static List<Integer> getValidAsciiValues() {
List<Integer> validAsciiValues = new ArrayList<Integer>();
for (int i=193; i<=218; i++) {
validAsciiValues.add(i);
}
for (int i=225; i<=234; i++) {
validAsciiValues.add(i);
}
validAsciiValues.add(32);
validAsciiValues.add(38);
validAsciiValues.add(40);
validAsciiValues.add(41);
validAsciiValues.add(47);
validAsciiValues.add(247);
validAsciiValues.add(248);
validAsciiValues.add(249);
validAsciiValues.add(250);
return validAsciiValues;
}
public static void removeSpecialArabicCharacters(String name_a) {
//replace_mult_spaces(name_a)
int stringLenth = name_a.length();
int pos = 0; //the Java index is 0-based (starts from 0)
while (pos < stringLenth) {
char nextChar = name_a.substring(pos, pos+1).toCharArray()[0];
int asciiValue = (int) nextChar;
if (getValidAsciiValues().contains(asciiValue)) {
pos++;
} else {
throw new AssertionError("The string contains invalid characters");
}
}
name_a = name_a.replaceAll("ې"," ې ");
if (name_a.substring(stringLenth).equals('ي')) {
name_a = name_a.substring(0, stringLenth - 2);
}
name_a = name_a.replaceAll(" ", "ه ");
if (name_a.substring(stringLenth).equals("ة")) {
name_a = name_a.substring(0, stringLenth - 2);
}
name_a = name_a.replace('ا', 'أ');
name_a = name_a.replace('ا', 'إ');
name_a = name_a.replace('ا', 'آ');
name_a = name_a.replace((char) 250, 'ل');
name_a = name_a.replace((char) 247, 'ل');
name_a = name_a.replace((char) 248, 'ل');
name_a = name_a.replace((char) 249, 'ل');
name_a = name_a.replace((char) 63, 'ل');
name_a.replace(' ابن ',' بن ');
if (name_a.substring(0,5).equals("'عبد ال")) {
name_a = name_a.substring(6);
}
name_a.replaceAll(" عبد ال"" " عبدال");
if (name_a.substring(0,3).equals("'ابن"))) {
name_a = name_a.substring(4);
}
if (name_a.substring(-4).equals("ابن))")) {
name_a = name_a.substring(0, name_a.length()-4);
}
}
}
You can compare the two side-by-side to get a better feeling.
Upvotes: 1