Reputation: 923
Currently I am using scanner/filereader and using while hasnextline. I think this method is not highly efficient. Is there any other method to read file with the similar functionality of this?
public void Read(String file) {
Scanner sc = null;
try {
sc = new Scanner(new FileReader(file));
while (sc.hasNextLine()) {
String text = sc.nextLine();
String[] file_Array = text.split(" ", 3);
if (file_Array[0].equalsIgnoreCase("case")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("object")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("classes")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("function")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("ignore")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("display")) {
//do something
}
}
} catch (FileNotFoundException e) {
System.out.println("Input file " + file + " not found");
System.exit(1);
} finally {
sc.close();
}
}
Upvotes: 42
Views: 129562
Reputation: 3204
I am wondering why no one mentioned the MappedByteBuffer. I believe it's the most efficient way to read large files until 2GB.
Almost all projects require us to work with files. However, what if the file is excessively large? When heap becomes filled, the JVM generates an OutOfMemoryError as an error. Java offers the MappedByteBuffer (JavaNIO) class, which facilitates the manipulation of sizable files.
The class MappedByteBuffer is responsible for establishing a virtual-memory mapping using JVM memory. The contents of the file are loaded into virtual memory rather than the heap, and the JVM can receive and write data in JVM memory without requiring OS-specific read/write system calls. Additionally, we can map a subset of a file rather than the entire file.
Obtaining FileChannel from MappedByteBuffer enables us to map a file. The FileChannel link enables file manipulation, writing, and reading. FileChannel is accessible via FileOutputStream (for writing) and RandomAccessFile, as well as FileInputStream (for reading only).
To map a file, FileChannel provides the map() method. It requires three arguments.
Mode of the map (PRIVATE, READ_ONLY, and READ_WRITE)
Placement
Size
Once MappedByteBuffer is obtained, the get() and put() methods can be used to receive and write data, respectively.
The file is located in the /resource directory so we can load it using the following function:
Path getFileURIFromResources(String fileName) throws Exception {
return Paths.get(fileNamePath);
}
This is how we read from MappedBuffer:
CharBuffer charBuffer = null;
Path pathToRead = getFileURIFromResources("fileToRead.txt");
try (FileChannel fileChannel (FileChannel) Files.newByteChannel(
pathToRead, EnumSet.of(StandardOpenOption.READ))) {
MappedByteBuffer mappedByteBuffer = fileChannel
.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
if (mappedByteBuffer != null) {
charBuffer = Charset.forName("UTF-8").decode(mappedByteBuffer);
}
}
This is how we write:
CharBuffer charBuffer = CharBuffer
.wrap("This will be written to the file");
Path pathToWrite = getFileURIFromResources("fileToWriteTo.txt");
try (FileChannel fileChannel = (FileChannel) Files
.newByteChannel(pathToWrite, EnumSet.of(
StandardOpenOption.READ,
StandardOpenOption.WRITE,
StandardOpenOption.TRUNCATE_EXISTING))) {
MappedByteBuffer mappedByteBuffer = fileChannel
.map(FileChannel.MapMode.READ_WRITE, 0, charBuffer.length());
if (mappedByteBuffer != null) {
mappedByteBuffer.put(
Charset.forName("utf-8").encode(charBuffer));
}
}
Upvotes: 1
Reputation: 660
You can read the file in chunks if there are millions of records. That will avoid potential memory issue. You need to keep last pointer to calculate offset of file.
try (FileReader reader = new FileReader(filePath);
BufferedReader bufferedReader = new BufferedReader(reader);) {
int pageOffset = lastOffset + counter;
int skipRecords = (pageOffset - 1) * batchSize;
bufferedReader.lines().skip(skipRecords).forEach(cline -> {
try {
// PRINT
}
Upvotes: 0
Reputation: 602
I made a gist comparing different methods:
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.function.Function;
public class Main {
public static void main(String[] args) {
String path = "resources/testfile.txt";
measureTime("BufferedReader.readLine() into LinkedList", Main::bufferReaderToLinkedList, path);
measureTime("BufferedReader.readLine() into ArrayList", Main::bufferReaderToArrayList, path);
measureTime("Files.readAllLines()", Main::readAllLines, path);
measureTime("Scanner.nextLine() into ArrayList", Main::scannerArrayList, path);
measureTime("Scanner.nextLine() into LinkedList", Main::scannerLinkedList, path);
measureTime("RandomAccessFile.readLine() into ArrayList", Main::randomAccessFileArrayList, path);
measureTime("RandomAccessFile.readLine() into LinkedList", Main::randomAccessFileLinkedList, path);
System.out.println("-----------------------------------------------------------");
}
private static void measureTime(String name, Function<String, List<String>> fn, String path) {
System.out.println("-----------------------------------------------------------");
System.out.println("run: " + name);
long startTime = System.nanoTime();
List<String> l = fn.apply(path);
long estimatedTime = System.nanoTime() - startTime;
System.out.println("lines: " + l.size());
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);
}
private static List<String> bufferReaderToLinkedList(String path) {
return bufferReaderToList(path, new LinkedList<>());
}
private static List<String> bufferReaderToArrayList(String path) {
return bufferReaderToList(path, new ArrayList<>());
}
private static List<String> bufferReaderToList(String path, List<String> list) {
try {
final BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8));
String line;
while ((line = in.readLine()) != null) {
list.add(line);
}
in.close();
} catch (final IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> readAllLines(String path) {
try {
return Files.readAllLines(Paths.get(path));
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private static List<String> randomAccessFileLinkedList(String path) {
return randomAccessFile(path, new LinkedList<>());
}
private static List<String> randomAccessFileArrayList(String path) {
return randomAccessFile(path, new ArrayList<>());
}
private static List<String> randomAccessFile(String path, List<String> list) {
try {
RandomAccessFile file = new RandomAccessFile(path, "r");
String str;
while ((str = file.readLine()) != null) {
list.add(str);
}
file.close();
} catch (IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> scannerLinkedList(String path) {
return scanner(path, new LinkedList<>());
}
private static List<String> scannerArrayList(String path) {
return scanner(path, new ArrayList<>());
}
private static List<String> scanner(String path, List<String> list) {
try {
Scanner scanner = new Scanner(new File(path));
while (scanner.hasNextLine()) {
list.add(scanner.nextLine());
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return list;
}
}
run: BufferedReader.readLine() into LinkedList, lines: 1000000, estimatedTime: 0.105118655
run: BufferedReader.readLine() into ArrayList, lines: 1000000, estimatedTime: 0.072696934
run: Files.readAllLines(), lines: 1000000, estimatedTime: 0.087753316
run: Scanner.nextLine() into ArrayList, lines: 1000000, estimatedTime: 0.743121734
run: Scanner.nextLine() into LinkedList, lines: 1000000, estimatedTime: 0.867049885
run: RandomAccessFile.readLine() into ArrayList, lines: 1000000, estimatedTime: 11.413323046
run: RandomAccessFile.readLine() into LinkedList, lines: 1000000, estimatedTime: 11.423862897
BufferedReader
is the fastest, Files.readAllLines()
is also acceptable, Scanner
is slow due to regex, RandomAccessFile
is inacceptable
Upvotes: 28
Reputation: 560
just updating this thread, now we have java 8 to do this job:
List<String> lines = Files.readAllLines(Paths.get(file_path);
Upvotes: 3
Reputation: 143
Scanner
can't be as fast as BufferedReader
, as it uses regular expressions for reading text files, which makes it slower compared to BufferedReader
. By using BufferedReader
you can read a block from a text file.
BufferedReader bf = new BufferedReader(new FileReader("FileName"));
you can next use readLine() to read from bf.
Hope it serves your purpose.
Upvotes: 9
Reputation: 166
Use BufferedReader for high performance file access. But the default buffer size of 8192 bytes is often too small. For huge files you can increase the buffer size by orders of magnitudes to boost your file reading performance. For example:
BufferedReader br = new BufferedReader("file.dat", 1000 * 8192);
while ((thisLine = br.readLine()) != null) {
System.out.println(thisLine);
}
Upvotes: 3
Reputation: 310840
You will find that BufferedReader.readLine()
is as fast as you need: you can read millions of lines a second with it. It is more probable that your string splitting and handling is causing whatever performance problems you are encountering.
Upvotes: 46
Reputation: 3214
If you wish to read all lines together then you should have a look at the Files API of java 7. Its really simple to use.
But a better approach would be to process this file in a batch. Have a reader which reads chunks of lines from the file and a writer which does the required processing or persists the data. Having abatch will ensure that it will work even if the lines increase to billion in future. Also you can have a batch which uses a multithreading to increase theoverall performance of the batch. I would recpmmend that you have a look at spring batch.
Upvotes: -2
Reputation: 14278
you can use FileChannel and ByteBuffer from JAVA NIO. ByteBuffer size is the most critical part in reading data faster what i have observed. Below code will read the content of the file.
static public void main( String args[] ) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(
new File("sample4.txt"));
FileChannel fileChannel = fileInputStream.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
fileChannel.read(byteBuffer);
byteBuffer.flip();
int limit = byteBuffer.limit();
while(limit>0)
{
System.out.print((char)byteBuffer.get());
limit--;
}
fileChannel.close();
}
You can check for '\n' for new line here. Thanks.
Even you can scatter and getter way to read files faster i.e.
fileChannel.get(buffers);
where
ByteBuffer b1 = ByteBuffer.allocate(B1);
ByteBuffer b2 = ByteBuffer.allocate(B2);
ByteBuffer b3 = ByteBuffer.allocate(B3);
ByteBuffer[] buffers = {b1, b2, b3};
This saves the user process to from making several system calls (which can be expensive) and allows kernel to optimize handling of the data because it has information about the total transfer, If multiple CPUs available it may even be possible to fill and drain several buffers simultaneously.
From this book.
Upvotes: 5
Reputation: 3390
You must investigate which part of program is taking time.
As per answer of EJP, you should use BufferedReader.
If really string processing is taking time, then you should consider using threads, one thread will read from file and queues lines. Other string processor threads will dequeue lines and process them. You will need to investigate how many threads to use, the number of threads you should use in application has to be related with number of cores in CPU, in that way will use full CPU.
Upvotes: 0