Ranking specific elements within a data structure - is the a more efficient way?

Question

I am ranking certain groups of elements within a .csv file. My program works. However ...

I am seeking advice on on how to improve the efficiency of a program I have written. I do not seek a review of my code. Stackoverflow ref. Nor I am requesting someone to write code for me. All I am asking is: "Is there a more efficient way? and if so what?"

I have a program that takes multiple .csv files, modifies them and adds extra data. These files are then saved. Below is a respresentation of the input data:

ISBN, Shop, Cost, ReviewScore,
9780008305796, A Bookshop, 11.99, 4.8,
9781787460966, A Bookshop, 6.99, 4.3,
9781787460966, Lots of books, 5.99, 4.4,
9781838770013, A Bookshop, 6.99, 3.8,
9780008305796, The bookseller, 13.99, 4.7,
9780008305796, Lots of books, 16.99, 4.1,

Note: each .csv file is normally 1000's of lines long. There could be 1 to 20 instances of an ISBN. The .csv is not ordered by any column.

My program works as follows (pseudocode):

load csv into String[][]
iterate through String[][] to create a map: with k = ISBN, v = number of occurances of that ISBN
iterate through String[][] 3.1 get the ISBN value from map then save each line that has that ISBN (stop when value reached) 3.2 then rank the price and reviews of saved lines, and save the lines into another var. 3.3 delete key 3.4 go back to 3. until there are no keys
save into .csv

data will now look like:

ISBN, Shop, Cost, ReviewScore, CostRank, ReviewRank
9780008305796, A Bookshop, 11.99, 4.8, 1, 1
9781787460966, A Bookshop, 6.99, 4.3, 2, 2
9781787460966, Lots of books, 5.99, 4.4, 1, 1
9781838770013, A Bookshop, 6.99, 3.8, 1, 1
9780008305796, The bookseller, 13.99, 4.1, 2, 3
9780008305796, Lots of books, 16.99, 4.3, 3, 2

This program does not depend on the type of data structure the .csv is loaded into. It could be a List, List of Lists, Collection etc.

Matthew · Accepted Answer

You /could/ do it in a single pass, the code would look something like so:

  Map dataStore = new HashMap();
  forEach(row : rows) {
     IsbnData datum = dataStore.get(row[0]); //or whatever the index of ISBN is
     if(datum == null) {
        datum = createIsbnDataFromRow(row);
     } else {
        datum = updateDatumWithMoreData(datum, row);
     }

     dataStore.put(row[0], datum);
  }

The main benefit of this is that instead of having to deal with String[] you have nicely structured classes and the code is easier to read.

The code /may/ run faster, but that's probably irrelevant since it's much more likely to run out of memory before the speed matters. (Don't confuse this with the program being slow - it may well be slow, but that is due to reading / parsing the CSV files. The speed gain from passing over the CSV files less times after you've parsed them is negligable).

Ranking specific elements within a data structure - is the a more efficient way?

Answers (1)

Related Questions