user3338991
user3338991

Reputation:

java apache poi loop performance

i've been working with a code that works with a lot of data, which at first reads whole long xlsx into arraylist of arraylist, then in method worker it runs line by line and everytime a condition is met, it writes some columns and do some stuff (this might be 1000 per worker method), thus it writes 1 row per worker. Worker method is implemented ALOT of times, thus the final xlsx will have let's say 100000 lines, all with 1000 lines (hopefully more later.):

public static void main(String[] args) throws Throwable {
 ArrayList<ArrayList> data; //filled by many thousands of rows from xlsx table in other part of code
 SXSSFWorkbook workbook = new SXSSFWorkbook(100);
 Sheet sheet = workbook.createSheet("sheet1");
 FileOutputStream outputStream = new FileOutputStream("D:/output.xlsx");
 int rowCount = 0;
 Row row = sheet.createRow(0);

  for (int i = 1, 1 < 100000, i++){
    row = sheet.createRow(i);
    worker(data,sheet,workbook,row)
  }

 workbook.write(outputStream);
 outputStream.close();
 workbook.dispose();

}    

public static Row worker (ArrayList<ArrayList> data,Sheet sheet, SXSSFWorkbook workbook, Row row){

  int columnCount = 0;

        for (int i = 0; i < data.get(0).size(); i++) { //loop line by line through whole table
           if (data.get(1).get(i) meets some condition){
             Cell cell = row.createCell(++columnCount);
             cell.setCellValue("column number" + columnCount );
             do some other stuff;
           }
         }
return row;
   }

What is the bottleneck in this kind of computation? It runs much faster on my higher perf pc than nb, but i also tried to run it on 8core 28gb ram xeon server(win s 2012), where everything was underutilized, but there wasn't much (if any) improvement to my home pc. How can i make this small program run faster? Work with db instead of arraylist of arraylist? Make it somehow to compute in parallel? - any pointer here as which part to make parallel to which. Have no other ideas as i am amateur learn on the fly notaprogramer:/

Thanks for any input.

Upvotes: 0

Views: 1111

Answers (1)

centic
centic

Reputation: 15872

As you are not using multiple threads, only one Core of the CPU is used. Thus multiple cores or more memory do not improve much, only the bare CPU speed matters and that seems to be comparable between the two systems.

You could try to make this work in parallel, but note that POI itself does not make guarantees about thread-safety so you tread on thin ice a bit here. You might be able to create all the Rows up-front and then populate the cells in parallel.

Another option would be to try to use something like the streaming XLSX reader to process the file while it is read, but you would then write the data to a new file (maybe using SXSSFWorkbook to also not consume all the memory) as in-place writing is not possible.

Upvotes: 1

Related Questions