Reputation: 29150
I am using Java 1.4
as my client requirement as well as lucene-core-2.9.2.jar
and lucene-demos-2.9.2.jar
. I am using Ant
to build. It works fine for all directory except Unicode
and scandic char
.
When I try to listing using listFiles()
, it lists all but unicoded
data shows as block. When it wants to read the list using isDirectory()
, it can not define those folder name for indexing which are other languages(containing unicode
or scandic char
).
How can i solve this problem for using unicoded data and scandic char?
If I use Java 6 or 7,It works well.So as per client need(Java 1.4), please don't tell me to use java 5,6 or 7. Give other valuable answers. As your best understanding, I added my code below
public void addIntoIndex(File dir, IndexWriter indexWriter) {
try {
System.out.println("Now in addIntoIndex");
File[] htmls = dir.listFiles();
/** "Release_Notes" folder will be excluded for indexing */
if(dir.getName().equals("Release_Notes") && this.searchOption.equals("systemHelp")) {
System.out.println("'Release_Notes' folder will be excluded for indexing.");
return;
}
for(int i = 0; i < htmls.length; i++){
String htmlPath = htmls[i].getAbsolutePath();
if(htmls[i].isDirectory()) {
addIntoIndex(new File(htmls[i].getAbsolutePath()), indexWriter);
}
if(htmlPath.endsWith(".html") || htmlPath.endsWith(".htm")){
addDocument(htmlPath, indexWriter);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Upvotes: 2
Views: 196
Reputation: 29150
At last my problem is solved. Actually I am indexing all my html files which are as
<html>
<head>..</head>
<body>...</body>
</html>
in this format.
After adding the following 2 lines in head section, this problem solved in my java 1.4.02 version.
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta http-equiv="content-script-type" content="text/javascript; charset=UTF-8"/>
Special thanks to my project manager and Peter Lawrey and txtechhelp
Upvotes: 1
Reputation: 6777
Try this link that has some relevent answers for you: https://forums.oracle.com/thread/1288135
You can try here as well for some other possibilities: Setting java locale settings
basically it sounds like you just need to ensure the right locale is configured to get the correct Unicode strings.
Upvotes: 0