Reputation: 30885
I need to read ~50 files on every server start and place each text file's representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).
What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?
Thanks
Upvotes: 24
Views: 12185
Reputation: 61526
A memory mapped file will be fastest... something like this:
final File file;
final FileChannel channel;
final MappedByteBuffer buffer;
file = new File(fileName);
fin = new FileInputStream(file);
channel = fin.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
and then proceed to read from the byte buffer.
This will be significantly faster than FileInputStream
or FileReader
.
EDIT:
After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new BufferedInputStream(new FileInputStream(file))
instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.
So BufferedInputStream
should give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.
EDIT:
Ok here are some tests (the first one is done twice to get the files into the disk cache).
I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.
First the results...
(remember the first is repeated to get the disk cache setup)
ArrayTest
ArrayTest
DataInputByteAtATime
DataInputReadFully
MemoryMapped
Here is the code...
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main
{
public static void main(final String[] argv)
{
ArrayTest.main(argv);
ArrayTest.main(argv);
DataInputByteAtATime.main(argv);
DataInputReadFully.main(argv);
MemoryMapped.main(argv);
}
}
abstract class Test
{
public final void run(final File root)
{
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,
final Set<File> files)
{
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren)
{
if(child.isFile())
{
files.add(child);
}
else
{
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files)
{
long size;
size = 0;
for(final File file : files)
{
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
InputStream stream;
stream = null;
try
{
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do
{
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do
{
size += stream.read(data);
sum = 0;
for(int i = 0; i < size; i++)
{
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
FileInputStream stream;
stream = null;
try
{
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
Upvotes: 32
Reputation: 924
After searching across google for for existing tests on IO speed in Java, I must say TofuBear's test case completely opened my eyes. You have to run his test on your own platform to see what is fastest for you.
After running his test, and adding a few of my own (Credit to TofuBear for posting his original code), it appears you may get even more speed by using your own custom buffer vs. using the BufferedInputStream.
To my dismay the NIO ByteBuffer did not perform well.
NOTE: The static byte[] buffer shaved off a few ms, but the static ByteBuffers actualy increased the time to process! Is there anything wrong with the code??
I added a few tests:
ArrayTest_CustomBuffering (Read data directly into my own buffer)
ArrayTest_CustomBuffering_StaticBuffer (Read Data into a static buffer that is created only once in the beginning)
FileChannelArrayByteBuffer (use NIO ByteBuffer and wrapping your own byte[] array)
FileChannelAllocateByteBuffer (use NIO ByteBuffer with .allocate)
FileChannelAllocateByteBuffer_StaticBuffer (same as 4 but with a static buffer)
FileChannelAllocateDirectByteBuffer (use NIO ByteBuffer with .allocateDirect)
FileChannelAllocateDirectByteBuffer_StaticBuffer (same as 6 but with a static buffer)
Here are my results:, using Windows Vista and jdk1.6.0_13 on the extracted rt.jar:
ArrayTest
time = 2075
bytes = 2120336424
ArrayTest
time = 2044
bytes = 2120336424
ArrayTest_CustomBuffering
time = 1903
bytes = 2120336424
ArrayTest_CustomBuffering_StaticBuffer
time = 1872
bytes = 2120336424
DataInputByteAtATime
time = 2668
bytes = 2120336424
DataInputReadFully
time = 2028
bytes = 2120336424
MemoryMapped
time = 2901
bytes = 2120336424
FileChannelArrayByteBuffer
time = 2371
bytes = 2120336424
FileChannelAllocateByteBuffer
time = 2356
bytes = 2120336424
FileChannelAllocateByteBuffer_StaticBuffer
time = 2668
bytes = 2120336424
FileChannelAllocateDirectByteBuffer
time = 2512
bytes = 2120336424
FileChannelAllocateDirectByteBuffer_StaticBuffer
time = 2590
bytes = 2120336424
My hacked version of TofuBear's code:
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main {
public static void main(final String[] argv) {
ArrayTest.mainx(argv);
ArrayTest.mainx(argv);
ArrayTest_CustomBuffering.mainx(argv);
ArrayTest_CustomBuffering_StaticBuffer.mainx(argv);
DataInputByteAtATime.mainx(argv);
DataInputReadFully.mainx(argv);
MemoryMapped.mainx(argv);
FileChannelArrayByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer_StaticBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer_StaticBuffer.mainx(argv);
}
}
abstract class Test {
static final int BUFF_SIZE = 20971520;
static final byte[] StaticData = new byte[BUFF_SIZE];
static final ByteBuffer StaticBuffer =ByteBuffer.allocate(BUFF_SIZE);
static final ByteBuffer StaticDirectBuffer = ByteBuffer.allocateDirect(BUFF_SIZE);
public final void run(final File root) {
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,final Set<File> files) {
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren) {
if(child.isFile()) {
files.add(child);
}
else {
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files) {
long size;
size = 0;
for(final File file : files) {
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
int soFar;
int sum;
final int fileSize;
stream = new FileInputStream(file);
fileSize = (int)file.length();
soFar = 0;
do {
soFar += stream.read(StaticData, soFar, fileSize - soFar);
}
while(soFar != fileSize);
sum = 0;
for(int i=0;i<fileSize;i++) {
sum += StaticData[i];
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do {
size += stream.read(data);
sum = 0;
for(int i = 0;
i < size;
i++) {
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelArrayByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelArrayByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
buffer = ByteBuffer.wrap(data);
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocate((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocateDirect((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticBuffer.clear();
StaticBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticBuffer);
StaticBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticDirectBuffer.clear();
StaticDirectBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticDirectBuffer);
StaticDirectBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticDirectBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
Upvotes: 1
Reputation: 328566
The most efficient way is:
File.length()
)new InputStreamReader (new FileInputStream(file), encoding)
to readnew String(buffer)
If you need to search&replace once at startup, use String.replaceAll().
If you need to do it repeatedly, you may consider using StringBuilder. It has no replaceAll() but you can use it to manipulate the character array in place (-> no allocation of memory).
That said:
There is no reason to waste a lot of time into making this code run fast if it takes just 0.1s to execute.
If you still have a performance problem, consider to put all the text files into a JAR, add it into the classpath and use Class.getResourceAsStream() to read the files. Loading things from the Java classpath is highly optimized.
Upvotes: 5
Reputation: 533492
You should be able to read all the files in under a second using standard tools like Commons IO FileUtils.readFileToString(File)
You can use writeStringToFile(File, String) to save the modified file as well.
http://commons.apache.org/io/api-release/index.html?org/apache/commons/io/FileUtils.html
BTW: 50 is not a large number of files. A typical PC can have 100K files or more.
Upvotes: 0
Reputation: 62769
Any conventional approach is going to be limited in speed. I'm not sure you'll see much of a difference from one approach to the next.
I would concentrate on business tricks that could make the entire operation faster.
For instance, if you read all the files and stored them in a single file with the timestamps from each of your original file, then you could check to see if any of the files have changed without actually opening them. (a simple cache, in other words).
If your problem was getting a GUI up quickly, you might find a way to open the files in a background thread after your first screen was displayed.
The OS can be pretty good with files, if this is part of a batch process (no user I/O), you could start with a batch file that appends all the files into one big one before launching java, using something like this:
echo "file1" > file.all
type "file1" >> file.all
echo "file2" >> file.all
type "file2" >> file.all
Then just open file.all (I'm not sure how much faster this will be, but it's probably the fastest approach for the conditions I just stated)
I guess I'm just saying that more often than not, a solution to a speed issue often requires expanding your viewpoint a little and completely rethinking the solution using new parameters. Modifications of an existing algorithm usually only give minor speed enhancements at the cost of readability.
Upvotes: 0
Reputation: 7376
It depends a lot on the internal structure of your text files and what you intend to do with them.
Are the files key-value dictionaries (i.e. "properties" files)? XML? JSON? You have standard structures for those.
If they have a formal structure you may also use JavaCC to build an object representation of the files.
Otherwise, if they are just blobs of data, well, read the files and put them in a String.
Edit: about search&replace- juste use String's replaceAll function.
Upvotes: 1