Reputation: 51
I have a large text file, and i want read it, when i try do it without any operations like add some text from this file to List it read file maximum to one minute but when i try add some text to arrayList and next i want do some operations it is too slowly, do you know how can i read this data and use it? This is my code:
public class ReaderTEst {
public static void main(String[] args) throws IOException {
List<String> graphList = new ArrayList<>();
List<String> edgeList = new ArrayList<>();
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream("myText.txt");
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
line = line.replace("\uFEFF", "");//i use UTF-8 file so I need delete unneeded character
if (Character.isWhitespace(line.charAt(0))) {
edgeList.add(line.trim());
} else {
graphList.add(line);
}
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
}
} It takes to many time, do you know how it could be faster? I have file txt with 600 MB When i change :
List<Integer> graphList = new ArrayList<>(1);
int i = 0;
while (sc.hasNextLine()) {`String line = sc.nextLine();`
line = line.replace("\uFEFF", "");//i use UTF-8 file so I need delete unneeded character
graphList.add(i++);
}
i works, but when i want put string it takes too long time
Upvotes: 1
Views: 169
Reputation: 412
First of all I advise to use the LinkedList realization of List because of architectual features. Thus the ArrayList is built-on arrays, the LinkedList consists on Nodes. The ArrayList creates new bigger arrays and copy old one the new one, then it is reach some capasity. Oracle has perfect documentation about this, I recommend it to you LinkedList ArrayList
Upvotes: 0
Reputation: 311039
You should use BufferedReader.readLine()
. You can read millions of lines per second with that. Scanner
is overkill for what you're doing.
BUT \uFEFF
is not text. Is this really a text file? Is that a BOM marker? in which case it will only be at the beginning of the first line: no need to scan for it in every line.
Upvotes: 1
Reputation: 29720
Your main issues are the following:
List<String> graphList = new ArrayList<>();
List<String> edgeList = new ArrayList<>();
You should initialize each List
with an initial capacity so that the JVM does not need to automatically expand the backing array.
line = line.replace("\uFEFF", "");
This will also slow down your program. How often is \uFEFF
in each line? I would check if the line contains \uFEFF
before attempting to replace it.
Other than that, there's not much else to optimize; maybe you can utilize a FileChannel
to read the file, but that's about it.
Upvotes: 0