grapexs
grapexs

Reputation: 51

How read big text file and work with it in Java

I have a large text file, and i want read it, when i try do it without any operations like add some text from this file to List it read file maximum to one minute but when i try add some text to arrayList and next i want do some operations it is too slowly, do you know how can i read this data and use it? This is my code:

public class ReaderTEst {
public static void main(String[] args) throws IOException {
    List<String> graphList = new ArrayList<>();
    List<String> edgeList = new ArrayList<>();
    FileInputStream inputStream = null;
    Scanner sc = null;
    try {
        inputStream = new FileInputStream("myText.txt");
        sc = new Scanner(inputStream, "UTF-8");
        while (sc.hasNextLine()) {
            String line = sc.nextLine();
            line = line.replace("\uFEFF", "");//i use UTF-8 file so I need delete unneeded character
            if (Character.isWhitespace(line.charAt(0))) {
                edgeList.add(line.trim());
            } else {
                graphList.add(line);
            }
        }
        if (sc.ioException() != null) {
            throw sc.ioException();
        }
    } finally {
        if (inputStream != null) {
            inputStream.close();
        }
        if (sc != null) {
            sc.close();
        }
    }
}

} It takes to many time, do you know how it could be faster? I have file txt with 600 MB When i change :

List<Integer> graphList = new ArrayList<>(1);
int i = 0;
while (sc.hasNextLine()) {`String line = sc.nextLine();`
        line = line.replace("\uFEFF", "");//i use UTF-8 file so I need delete unneeded character

            graphList.add(i++);

    }

i works, but when i want put string it takes too long time

Upvotes: 1

Views: 169

Answers (3)

Alexander Belenov
Alexander Belenov

Reputation: 412

First of all I advise to use the LinkedList realization of List because of architectual features. Thus the ArrayList is built-on arrays, the LinkedList consists on Nodes. The ArrayList creates new bigger arrays and copy old one the new one, then it is reach some capasity. Oracle has perfect documentation about this, I recommend it to you LinkedList ArrayList

Upvotes: 0

user207421
user207421

Reputation: 311039

You should use BufferedReader.readLine(). You can read millions of lines per second with that. Scanner is overkill for what you're doing.

BUT \uFEFF is not text. Is this really a text file? Is that a BOM marker? in which case it will only be at the beginning of the first line: no need to scan for it in every line.

Upvotes: 1

Jacob G.
Jacob G.

Reputation: 29720

Your main issues are the following:

List<String> graphList = new ArrayList<>();
List<String> edgeList = new ArrayList<>();

You should initialize each List with an initial capacity so that the JVM does not need to automatically expand the backing array.

line = line.replace("\uFEFF", "");

This will also slow down your program. How often is \uFEFF in each line? I would check if the line contains \uFEFF before attempting to replace it.

Other than that, there's not much else to optimize; maybe you can utilize a FileChannel to read the file, but that's about it.

Upvotes: 0

Related Questions