Reputation: 7141
Im having a problem with a BufferedWriter. I am reading in a 50,000 word wordlist, using a stemming algorithm and creating a new wordlist that just contains the word stems. Instead of this new file containing any stems however it litrally just contains:
-
Here is my code:
public static void main(String[] args) {
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
Stemmer s= new Stemmer();
s.addword(word);
s.stem();
String stem =s.toString();
if(!db.contains(stem)){
db.add(stem);
writer.write(stem);
//System.out.println(stem);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
The output i get on the console is:
Reduced file from 58110 words to 28201
So i know its working. Ive also tried changing writer.write(stem);
to writer.write("hi");
and I still get the same output in newwordlist.txt
.
I know its no fault of the Stemmer class, Ive tried outputting the stem string (where I commented the code) and that produced the correct output to console so the fault must be with the writer but I dont understand what.
Edit 1
I simplified to code to:
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist.txt"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
if(!db.contains(word)){
db.add(word);
writer.write("hi");
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
Now i get console output:
Reduced file from 58110 words to 58109
But the output file is still blank
Upvotes: 1
Views: 572
Reputation: 718798
I would expect the code as given in the Question to produce a file that consists of one line, consisting of all of the "stems" concatenated. (Or in the "hi" version, one line consisting of "hihihi...." repeated a large number of times.)
It is conceivable that whatever you are using to view the file cannot cope with an input file that consists of many thousands of characters ... and no end-of-line.
Change
writer.write(stem);
to
writer.write(stem);
writer.write(EOL);
where EOL is the platform specific end-of-line sequence.
Assuming you are using Java 7, it would be better to use try-with-resource to make sure that the output stream is always closed / flushed, even if there is an error:
public static void main(String[] args) {
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream("src/wordlist"), "UTF-8"));
BufferedWriter writer = new BufferedWriter(new FileWriter(
new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<>();
String EOL = System.getProperty("line.separator");
String word;
int i = 0;
while ((word = reader.readLine()) != null) {
i++;
Stemmer s = new Stemmer();
s.addword(word);
s.stem();
String stem = s.toString();
if (db.add(stem)) {
writer.write(stem);
writer.write(EOL);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
} catch (IOException e1) {
e1.printStackTrace();
}
}
(I tidied up a couple of other things too ...)
Upvotes: 1
Reputation: 827
According to the Java documentation you need to use BufferedWriter.write() as follows:
write(string,offset,length);
so try:
writer.write(stem,0,stem.length());
Upvotes: 1
Reputation: 18445
Works for me. Is this your exact class, did you edit it before pasting in?
wordlist;
the
cat
sat
on
the
mat
newwordlist.txt;
thecatsatonmat
My Stemmer
just returns the word you gave it.
public class Stemmer {
private String word;
public void addword(String word) {
this.word = word;
}
public void stem() {
// TODO Auto-generated method stub
}
@Override
public String toString() {
return word;
}
}
Upvotes: 1
Reputation: 533510
When I run your edited code I get one line with
hihihihihihihihihihihihihi ............
As expected.
Perhaps you intended to add newline characters line this.
if(!db.contains(word)){
db.add(word);
writer.write(word);
writer.write("\n");
}
Upvotes: 0
Reputation: 3842
The reason you get Reduced file from 58110 words to 58109
console output is that you only have one System.out.println
statement after the loop.
The writer should write words only to the output file src/newwordlist.txt
and not to the console. If you want your program to output words to the console add additional System.out.println(word)
after writer.write("hi");
Hope this helps...
Upvotes: 1