SkyWalker
SkyWalker

Reputation: 29150

listFiles(), isDirectory() method can not read unicoded data in java 1.4

I am using Java 1.4 as my client requirement as well as lucene-core-2.9.2.jar and lucene-demos-2.9.2.jar. I am using Ant to build. It works fine for all directory except Unicode and scandic char.

When I try to listing using listFiles(), it lists all but unicoded data shows as block. When it wants to read the list using isDirectory(), it can not define those folder name for indexing which are other languages(containing unicode or scandic char).

How can i solve this problem for using unicoded data and scandic char?

If I use Java 6 or 7,It works well.So as per client need(Java 1.4), please don't tell me to use java 5,6 or 7. Give other valuable answers. As your best understanding, I added my code below

public void addIntoIndex(File dir, IndexWriter indexWriter) {       
try {
    System.out.println("Now in addIntoIndex");
    File[] htmls = dir.listFiles();

    /** "Release_Notes" folder will be excluded for indexing */
    if(dir.getName().equals("Release_Notes") && this.searchOption.equals("systemHelp")) {
        System.out.println("'Release_Notes' folder will be excluded for indexing.");
        return;
    }

    for(int i = 0; i < htmls.length; i++){
        String htmlPath = htmls[i].getAbsolutePath();   

        if(htmls[i].isDirectory()) {
            addIntoIndex(new File(htmls[i].getAbsolutePath()), indexWriter);
        }

        if(htmlPath.endsWith(".html") || htmlPath.endsWith(".htm")){
            addDocument(htmlPath, indexWriter);
        }
    }

} catch (Exception e) {
    e.printStackTrace();
}
}

Upvotes: 2

Views: 196

Answers (2)

SkyWalker
SkyWalker

Reputation: 29150

At last my problem is solved. Actually I am indexing all my html files which are as

<html>
<head>..</head>
<body>...</body>
</html>

in this format.

After adding the following 2 lines in head section, this problem solved in my java 1.4.02 version.

<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta http-equiv="content-script-type" content="text/javascript; charset=UTF-8"/>

Special thanks to my project manager and Peter Lawrey and txtechhelp

Upvotes: 1

txtechhelp
txtechhelp

Reputation: 6777

Try this link that has some relevent answers for you: https://forums.oracle.com/thread/1288135

You can try here as well for some other possibilities: Setting java locale settings

basically it sounds like you just need to ensure the right locale is configured to get the correct Unicode strings.

Upvotes: 0

Related Questions