Reputation: 11
How to search a unicode string in a file using java? Below is the code that I have tried.It works strings other than unicode.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.*;
import java.util.*;
class file1
{
public static void main(String arg[])throws Exception
{
BufferedReader bfr1 = new BufferedReader(new InputStreamReader(
System.in));
System.out.println("Enter File name:");
String str = bfr1.readLine();
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
String s;
int count=0;
int flag=0;
System.out.println("Enter the string to be found");
s=br.readLine();
BufferedReader bfr = new BufferedReader(new FileReader(str));
String bfr2=bfr.readLine();
Pattern p = Pattern.compile(s);
Matcher matcher = p.matcher(bfr2);
while (matcher.find()) {
count++;
}System.out.println(count);
}}
Upvotes: 1
Views: 1078
Reputation: 1500495
Well, there are three potential sources of problems I can see:
FileReader
which always uses the platform default encoding. What's the encoding of the file you're trying to read? I would recommend using FileInputStream
wrapped in an InputStreamReader
using an explicit encoding (e.g. UTF-8) which matches the file.To debug the real values in strings, I would usually use something like this:
private static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
System.out.printf("%d: %4h (%c)", i, c, c);
System.out.println();
}
}
That way you can see the exact UTF-16 code point in each char
in the string.
Upvotes: 3