Reputation: 5556
I want to write a Stream to file. However, the Stream is big (few Gb when write to file) so I want to use parallel. At the end of process, I would like to write to file (I am using FileWriter)
I would like to ask if that has potential cause any problem in file.
Here is some code
public static void writeStreamToFile(Stream<String> ss, String fileURI) {
try (FileWriter wr = new FileWriter(fileURI)) {
ss.forEach(line -> {
try {
if (line != null) {
wr.write(line + "\n");
}
} catch (Exception ex) {
System.err.println("error when write file");
}
});
} catch (IOException ex) {
Logger.getLogger(OaStreamer.class.getName()).log(Level.SEVERE, null, ex);
}
}
Stream<String> ss = Files.lines(path).parallel()
.map(x->dosomething(x))
.map(x->dosomethingagain(x))
writeStreamToFile(ss, "path/to/output.csv")
Upvotes: 1
Views: 3784
Reputation: 1192
Yes It is Ok to use FileWriter as you are using, I have some another ways which may be helpful to you.
As you are dealing with large files, FileChannel can be faster than standard IO. The following code write String to a file using FileChannel:
@Test
public void givenWritingToFile_whenUsingFileChannel_thenCorrect()
throws IOException {
RandomAccessFile stream = new RandomAccessFile(fileName, "rw");
FileChannel channel = stream.getChannel();
String value = "Hello";
byte[] strBytes = value.getBytes();
ByteBuffer buffer = ByteBuffer.allocate(strBytes.length);
buffer.put(strBytes);
buffer.flip();
channel.write(buffer);
stream.close();
channel.close();
// verify
RandomAccessFile reader = new RandomAccessFile(fileName, "r");
assertEquals(value, reader.readLine());
reader.close();
}
Reference : https://www.baeldung.com/java-write-to-file
You can use Files.write
with stream operations as below which converts the Stream to the Iterable:
Files.write(Paths.get(filepath), (Iterable<String>)yourstream::iterator);
For example:
Files.write(Paths.get("/dir1/dir2/file.txt"),
(Iterable<String>)IntStream.range(0, 1000).mapToObj(String::valueOf)::iterator);
If you have stream of some custom objects, you can always add the .map(Object::toString)
step to apply the toString()
method.
Upvotes: 2
Reputation: 1598
As others have mentioned, this approach should work, however you should question if it is the best method. Writing to a file is a shared operation between threads meaning you are introducing thread contention.
While it is easy to think that having multiple threads will speed up performance, in the case of I/O operations the opposite is true. Remember I/O operations are finitely bounded, so more threads will not increase performance. In fact, this I/O contention will slow down access to the shared resource because of the constant locking/unlocking of the ability to write to the resource.
The bottom line is that only one thread can write to a file at a time, so parallelizing write operations is counterproductive.
Consider using multiple threads to handle your CPU intensive tasks, and then having all threads post to a queue/buffer. A single thread can then pull from the queue and write to your file. This solution (and more detail) was suggested in this answer.
Checkout this article for more info on thread contention and locks.
Upvotes: 1
Reputation: 140533
It is not a problem in case it is okay for the file to have the lines in random order. You are reading content in parallel, not in sequence. Therefore you have no guarantees at which point any line is coming in for processing.
That is only thing to keep in mind here.
Upvotes: 0