Jack
Jack

Reputation: 1437

Speed up reading CSV in Java

I have a relatively inefficent CSVReader code, see below. It takes more than 30 seconds to read 30000+ lines. How to speed up this reading process as fast as possible?

public class DataReader {

    private String csvFile;
    private List<String> sub = new ArrayList<String>();
    private List<List> master = new ArrayList<List>();


    public void ReadFromCSV(String csvFile) {

        String line = "";
        String cvsSplitBy = ",";

        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
            System.out.println("Header " + br.readLine());
            while ((line = br.readLine()) != null) {

                // use comma as separator
                String[] list = line.split(cvsSplitBy);
//                System.out.println("the size is " + country[1]);
                for (int i = 0; i < list.length; i++) {
                    sub.add(list[i]);
                }
                List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
//                master.add(new ArrayList<String>(sub));
                master.add(temp);
                sub.removeAll(sub);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println(master);
    }

    public List<List> getMaster() {
        return master;
    }

}

UPDATE: I have found that my code actually can finish the reading work in less than 1 second if run it separately. As this DataReader is a part used by my simulation model to initialize the relevant properties. And the following part is associated with the use of the data imported, WHICH TAKES 40 SECONDS TO FINISH! Anyone could help by looking at the generic part of the codes?

//      add route network
        Network<Object> net = (Network<Object>)context.getProjection("IntraCity Network");
        IndexedIterable<Object> local_hubs = context.getObjects(LocalHub.class);
        for (int i = 0; i <= CSV_reader_route.getMaster().size() - 1; i++) {
            String source = (String) CSV_reader_route.getMaster().get(i).get(0);
            String target = (String) CSV_reader_route.getMaster().get(i).get(3);
            double dist = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(6));
            double time = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(7));

            Object source_hub = null;
            Object target_hub = null;
            Query<Object> source_query = new PropertyEquals<Object>(context, "hub_code", source);
            for (Object o : source_query.query()) {
                if (o instanceof LocalHub) {
                    source_hub = (LocalHub) o;
                }
                if (o instanceof GatewayHub) {
                    source_hub = (GatewayHub) o;
                }
            }

            Query<Object> target_query = new PropertyEquals<Object>(context, "hub_code", target);
            for (Object o : target_query.query()) {
                if (o instanceof LocalHub) {
                    target_hub = (LocalHub) o;
                }
                if (o instanceof GatewayHub) {
                    target_hub = (GatewayHub) o;
                }
            }

//          System.out.println(target_hub.getClass() + " " + time);
//          Route this_route = (Route) net.addEdge(source_hub, target_hub);
//          context.add(this_route);
//          System.out.println(net.getEdge(source_hub, target_hub));
            if (net.getEdge(source, target) == null) {
                Route this_route = (Route) net.addEdge(source, target);
                context.add(this_route);
//              this_route.setDist(dist);
//              this_route.setTime(time); }
            }



        } 

Upvotes: 1

Views: 838

Answers (3)

Ashok Prajapati
Ashok Prajapati

Reputation: 374

In your code you are doing many write operation to just add the list of values from current row in your master list which is not required. You can replace the existing code with simple one as given below.

Existing code:

String[] list = line.split(cvsSplitBy);
//                System.out.println("the size is " + country[1]);
for (int i = 0; i &lt; list.length; i++) {
    sub.add(list[i]);
}

List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
//                master.add(new ArrayList<String>(sub));
master.add(temp);
sub.removeAll(sub);

Suggested code:

master.add(Arrays.asList(line.split(cvsSplitBy)));

Upvotes: 2

Atul
Atul

Reputation: 3377

With extends to the answer of @Alex R, you can process it in parallel as well like this:

public static void main(String[] args) throws IOException {
        Path csvPath = Paths.get("path/to/file.csv");
        List<List<String>> master = Files.lines(csvPath)
                .skip(1).parallel()
                .map(line -> Arrays.asList(line.split(",")))
                .collect(Collectors.toList());
    }

Upvotes: 1

Alex R
Alex R

Reputation: 3311

I don't have a CSV that big, but you could try the following:

public static void main(String[] args) throws IOException {
    Path csvPath = Paths.get("path/to/file.csv");
    List<List<String>> master = Files.lines(csvPath)
            .skip(1)
            .map(line -> Arrays.asList(line.split(",")))
            .collect(Collectors.toList());
}

EDIT: I tried it with a CSV sample with 50k entries and the code runs in less than one second.

Upvotes: 2

Related Questions