benz
benz

Reputation: 4629

Java Removing Special Arabic Characters

I have got a requirement to write a utility which will be removing some special characters from a given String input. I am unable to understand, how can i approach this task. I have been given a db procedure which does the same and i need to replicate the same algorithm in the java code. I am putting procedure here.

create or replace procedure dbimm.check_arabic_letters (name_a in out varchar2) as
      pos      number(3);
      strlen   number(3);
      nxtchar  char(1);
      ascval   number(3);
begin
      replace_mult_spaces(name_a);
      strlen := length(name_a);
      pos := 1;
      while pos <= strlen loop
         nxtchar := substr(name_a, pos, 1);
         ascval  := ascii(nxtchar);
      --   dbms_output.put_line(to_char(ascval));
         if (ascval between 193 and 218) or
            (ascval between 225 and 234) or
            (ascval in  (32,38,40,41,47,247, 248, 249, 250))
         then
             pos := pos + 1;
         else
            raise_application_error(-20000,display_message(9));
         end if;
      end loop;
      name_a := replace(name_a, 'ي ','ى ');
      if substr(name_a, strlen) = 'ي' then
          name_a := substr(name_a, 1, strlen - 1) || 'ى';
      end if;
      name_a := replace(name_a, 'ة ', 'ه ');
      if substr(name_a, strlen) = 'ة' then
          name_a := substr(name_a, 1, strlen - 1) || 'ه';
      end if;

      /*   Old code commented by Mobeen
      name_a := replace(name_a, ' عبد ',' عبد');
      if instr(name_a,'عبد ') = 1 and length(name_a) > 4 then
          name_a := substr(name_a, 1, 3) || substr(name_a,5);
      end if;
      */
      -------

     name_a := replace(name_a,'أ','ا');
      name_a := replace(name_a,'إ','ا');
      name_a := replace(name_a,'آ','ا');
      --m name_a := replace(name_a,'لا','?');
      name_a := replace(name_a,chr(250),'لا');
      name_a := replace(name_a,chr(247),'لا');
      name_a := replace(name_a,chr(248),'لا');
      name_a := replace(name_a,chr(249),'لا');
      name_a := replace(name_a,chr(63),'لا');

      --- New Code added by Patrick
      name_a := replace(name_a,   ' عبد ال', ' عبدال');
        if substr(name_a,1,6)= 'عبد ال' then  --start
         name_a:= 'عبدال'||substr(name_a,7);
      end if;
      ----

      name_a := replace(name_a, ' ابن ',' بن '); --middle
      if substr(name_a,1,4)='ابن ' then  --start
         name_a:='بن '||substr(name_a,5);
      end if;
      if substr(name_a,-4)=' ابن' then --end
         name_a:=substr(name_a,1,length(name_a)-4)||' بن';
      end if;
      -------

I started replicating the same somewhat like this in my java class.

public class ReplaceSpecialArabicCharacUtil {


  /**
   * This method is responsible for replacing special arabic
   * Characters from the input given to the method. This method
   * Algorithm is taken from the database procedure already been
   * used for blacklist.
   * @param nameInArabic name in Arabic of applicant. E.g First name, last name
   * @return
   */
  public static String removeSpecialArabicCharacters(String nameInArabic){

    //Step-1 Remove multiple spaces. Take the procedure replica from Naveed
     nameInArabic = nameInArabic.replaceAll(" ې" ,"ی ");


    return nameInArabic;
  }

  /**
   * Driver method responsible for testing the Algorithm.
   * It is replicated from the Database Procedure.
   * @param args
   */
  public static void main(String[] args) throws UnsupportedEncodingException {

    String s ="ې ";
   // System.out.println(removeSpecialArabicCharacters(s).getBytes("UTF-8"));

  }

}

replaceAll does not understand spaces. I am not sure, whether i am approaching the problem correct way. Can someone help me because i want to write this utility the correct way.

Thanks, Ben

Upvotes: 0

Views: 1093

Answers (1)

Gabriel Ruiu
Gabriel Ruiu

Reputation: 2803

As best as I could, I have mimicked your procedure using Java code, except the replace_mult_space which I don't know what it does.

NOTE: when you copy paste you will definitely find compilation errors because my IDE, and also StackOverflow, don't really support arabic characters very well. So you will have to tweak the code yourself until you achieve your desired result.

Here's is the Java-equivalent of your procedure:

public class ReplaceSpecialArabicCharacUtil {

    public static List<Integer> getValidAsciiValues() {
        List<Integer> validAsciiValues = new ArrayList<Integer>();
        for (int i=193; i<=218; i++) {
            validAsciiValues.add(i);
        }
        for (int i=225; i<=234; i++) {
            validAsciiValues.add(i);
        }

        validAsciiValues.add(32);
        validAsciiValues.add(38);
        validAsciiValues.add(40);
        validAsciiValues.add(41);
        validAsciiValues.add(47);
        validAsciiValues.add(247);
        validAsciiValues.add(248);
        validAsciiValues.add(249);
        validAsciiValues.add(250);

        return validAsciiValues;
    }

    public static void removeSpecialArabicCharacters(String name_a) {

        //replace_mult_spaces(name_a)
        int stringLenth = name_a.length();
        int pos = 0; //the Java index is 0-based (starts from 0)
        while (pos < stringLenth) {
            char nextChar = name_a.substring(pos, pos+1).toCharArray()[0];
            int asciiValue = (int) nextChar;
            if (getValidAsciiValues().contains(asciiValue)) {
                pos++;
            } else {
                throw new AssertionError("The string contains invalid characters");
            }
        }
        name_a = name_a.replaceAll("ې"," ې ");
        if (name_a.substring(stringLenth).equals('ي')) {
            name_a = name_a.substring(0, stringLenth - 2);
        }
        name_a = name_a.replaceAll(" ", "ه  ");
        if (name_a.substring(stringLenth).equals("ة")) {
            name_a = name_a.substring(0, stringLenth - 2);
        }

        name_a = name_a.replace('ا', 'أ');
        name_a = name_a.replace('ا', 'إ');
        name_a = name_a.replace('ا', 'آ');
        name_a = name_a.replace((char) 250, 'ل');
        name_a = name_a.replace((char) 247, 'ل');
        name_a = name_a.replace((char) 248, 'ل');
        name_a = name_a.replace((char) 249, 'ل');
        name_a = name_a.replace((char) 63, 'ل');

        name_a.replace(' ابن ',' بن ');
        if (name_a.substring(0,5).equals("'عبد ال")) {
            name_a = name_a.substring(6);
        }


        name_a.replaceAll(" عبد ال"" " عبدال");
        if (name_a.substring(0,3).equals("'ابن"))) {
            name_a = name_a.substring(4);
        }
        if (name_a.substring(-4).equals("ابن))")) {
            name_a = name_a.substring(0, name_a.length()-4);
        }
    }
}

You can compare the two side-by-side to get a better feeling.

Upvotes: 1

Related Questions