Reputation: 6365
I know there were several similar threads here and on the net but I seem to be doing something wrong, I guess. My task is easy - write (and later read) a big array of integers (int [] or ArrayList or what you think is best) to a file. The faster the better. My concrete array has about 4.5M integers in it and currently the times are for example (in ms):
This is unacceptable and I guess the times should be much lower. What am I doing wrong? I don't need the fastest method on earth but getting these times to about 5 - 15 seconds (less is welcome but not mandatory) is my goal.
My current code:
long start = System.nanoTime();
Node trie = dawg.generateTrie("dict.txt");
long afterGeneratingTrie = System.nanoTime();
ArrayList<Integer> array = dawg.generateArray(trie);
long afterGeneratingArray = System.nanoTime();
try
{
new ObjectOutputStream(new FileOutputStream("test.txt")).writeObject(array);
}
catch (Exception e)
{
Logger.getLogger(DawgTester.class.getName()).log(Level.SEVERE, null, e);
}
long afterSavingArray = System.nanoTime();
ArrayList<Integer> read = new ArrayList<Integer>();
try
{
read = (ArrayList)new ObjectInputStream(new FileInputStream("test.txt")).readObject();
}
catch (Exception e)
{
Logger.getLogger(DawgTester.class.getName()).log(Level.SEVERE, null, e);
}
long afterLoadingArray = System.nanoTime();
System.out.println("Generating trie: " + 0.000001 * (afterGeneratingTrie - start));
System.out.println("Generating array: " + 0.000001 * (afterGeneratingArray - afterGeneratingTrie));
System.out.println("Saving array: " + 0.000001 * (afterSavingArray - afterGeneratingArray));
System.out.println("Loading array: " + 0.000001 * (afterLoadingArray - afterSavingArray));
Upvotes: 0
Views: 2132
Reputation: 29656
Something like the following is probably a fairly fast option. You should also use an actual array int[]
rather a ArrayList<Integer>
if you're concern is reducing overhead.
final Path path = Paths.get("dict.txt");
...
final int[] rsl = dawg.generateArray(trie);
final ByteBuffer buf = ByteBuffer.allocateDirect(rsl.length << 2);
final IntBuffer buf_i = buf.asIntBuffer().put(rsl).flip();
try (final WritableByteChannel out = Files.newByteChannel(path,
StandardOpenOptions.WRITE, StandardOpenOptions.TRUNCATE_EXISTING)) {
do {
out.write(buf);
} while (buf.hasRemaining());
}
buf.clear();
try (final ReadableByteChannel in = Files.newByteChannel(path,
StandardOpenOptions.READ)) {
do {
in.read(buf);
} while (buf.hasRemaining());
}
buf_i.clear();
buf_i.get(rsl);
Upvotes: 0
Reputation: 53694
Don't use java Serialization. it is very powerful and robust, but not particularly speedy (or compact). use a simple DataOutputStream
and call writeInt()
. (make sure you use a BufferedOutputStream
between DataOutputStream
and FileOutputStream
).
if you want to pre-size your array on read, write your first int as the array length.
Upvotes: 3