JasonHong
JasonHong

Reputation: 41

How to replace characters using Regex

I received string from IBM Mainframe like below (2bytes graphic fonts)

" ;A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z;a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;0;1;2;3;4;5;6;7;8;9;`;-;=;₩;~;!;@;#;$;%;^;&;*;(;);_;+;|;[;];{;};:;";';,;.;/;<;>;?;";

and, I wanna change these characters to 1 byte ascii codes

How can I replace these using java.util.regex.Matcher, String.replaceAll() in Java

target characters :

;A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z;a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;0;1;2;3;4;5;6;7;8;9;`;-;=;\;~;!;@;#;$;%;^;&;*;(;);_;+;|;[;];{;};:;";';,;.;/;<;>;?;";

Upvotes: 2

Views: 468

Answers (2)

Alan Moore
Alan Moore

Reputation: 75222

This is not (as other responders are saying) a character-encoding issue, but regexes are still the wrong tool. If Java had an equivalent of Perl's tr/// operator, that would be the right tool, but you can hand-code it easily enough:

public static String convert(String oldString)
{
  String oldChars = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789`-=₩~!@#$%^&*()_+|[]{}:"',./<>?";
  String newChars = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789`-=\\~!@#$%^&*()_+|[]{}:\"',./<>?";

  StringBuilder sb = new StringBuilder();
  int len = oldString.length();
  for (int i = 0; i < len; i++)
  {
    char ch = oldString.charAt(i);
    int pos = oldChars.indexOf(ch);
    sb.append(pos < 0 ? ch : newChars.charAt(pos));
  }
  return sb.toString();
}

I'm assuming each character in the first string corresponds to the character at the same position in the second string, and that the first character (U+3000, 'IDEOGRAPHIC SPACE') should be converted to an ASCII space (U+0020).

Be sure to save the source file as UTF-8, and include the -encoding UTF-8 option when you compile it (or tell your IDE to do so).

Upvotes: 2

Kai Huppmann
Kai Huppmann

Reputation: 10775

Don't think this one's about regex, it's about encoding. Should be possible to read into a String with 2-byte and then write it with any other encoding. Look here for supported encodings.

Upvotes: 0

Related Questions