Sriram
Sriram

Reputation: 10558

Reading unicode characters from csv file

I have a csv file which contains words in english followed by their Hindi translation. I am trying to read the csv file and do some further processing with it. The csv file looks like so:

English,,Hindi,,,  
,,,,,  
Cat,,बिल्ली,,,  
Rat,,चूहा,,,  
abandon,,छोड़ देना,त्याग देना,लापरवाही की स्वतन्त्रता,जाने देना  

I am trying to read the csv file line by line and display what has been written. The code snippet (Java) is as follows:

   //Step 2. Read csv file and get the string.
            FileInputStream fis = null;
            BufferedReader br = null;
            try {
                fis = new FileInputStream(new File(csvFile));
            } catch (FileNotFoundException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }

            boolean startSeen = true;
            if(fis != null) {
                try {
                    br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
                } catch (UnsupportedEncodingException e2) {
                    // TODO Auto-generated catch block
                    e2.printStackTrace();
                    System.out.print("Unsupported encoding");
                }
                String line = null;
                if(br != null) {
                    try {
                        while((line = br.readLine()) != null) {
                            if(line.contains("English") == true) {
                                startSeen = true;
                            }

                            if((startSeen == true) && (line != null)) {
                                StringBuffer sbuf = new StringBuffer();
                                //Step 3. Parse the line.
                                sbuf.append(line);
                                System.out.println(sbuf.toString());
                            }
                        }
                    } catch (IOException e1) {
                        // TODO Auto-generated catch block
                        e1.printStackTrace();
                    }
                }  
}

However, the following output is what I get:

English,,Hindi,,,
,,,,,
Cat,,??????,,,
Rat,,????,,,
abandon,,???? ????,????? ????,???????? ?? ???????????,???? ????  

My Java is not that great and though I have gone through a number of posts on SO, I need more help in figuring out the exact cause of this problem.

Upvotes: 7

Views: 6816

Answers (3)

Kaushik Lele
Kaushik Lele

Reputation: 6637

So as discussed in above answers; solutions it is TWO steps 1) Save your txt file as UTF-8 2) Change the property of your Java code to use UTF-8 In Eclipse; right click on Java file; Properties -> Resurces -> Text File Encoding -> Other -> UTF-8

Refer screenshot given on http://howtodoinjava.com/2012/11/27/how-to-compile-and-run-java-program-written-in-another-language/

Upvotes: 0

Jon Kartago Lamida
Jon Kartago Lamida

Reputation: 854

For reading text file it is better to use character stream e.g by using java.util.Scanner directly instead of FileInputStream. About encoding you have to make sure first that the text file that you want to read is saved as 'UTF-8' and not otherwise. I also notice in my system, I have to save my java source file as 'UTF-8' as well to make it shown hindi char properly.

However I want to suggest simpler way to read csv file as follow:

Scanner scan = new Scanner(new File(csvFile));
while(scan.hasNext()){
   System.out.println(scan.nextLine());
}

see the output

Upvotes: 5

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 135992

I think your console cannot show Hindi chars. Try

System.out.println("Cat,,बिल्ली,,,");

to test

Upvotes: 2

Related Questions