Reputation: 50
I am ranking certain groups of elements within a .csv file. My program works. However ...
I am seeking advice on on how to improve the efficiency of a program I have written. I do not seek a review of my code. Stackoverflow ref. Nor I am requesting someone to write code for me. All I am asking is: "Is there a more efficient way? and if so what?"
I have a program that takes multiple .csv files, modifies them and adds extra data. These files are then saved. Below is a respresentation of the input data:
ISBN, Shop, Cost, ReviewScore,
9780008305796, A Bookshop, 11.99, 4.8,
9781787460966, A Bookshop, 6.99, 4.3,
9781787460966, Lots of books, 5.99, 4.4,
9781838770013, A Bookshop, 6.99, 3.8,
9780008305796, The bookseller, 13.99, 4.7,
9780008305796, Lots of books, 16.99, 4.1,
Note: each .csv file is normally 1000's of lines long. There could be 1 to 20 instances of an ISBN. The .csv is not ordered by any column.
My program works as follows (pseudocode):
data will now look like:
ISBN, Shop, Cost, ReviewScore, CostRank, ReviewRank
9780008305796, A Bookshop, 11.99, 4.8, 1, 1
9781787460966, A Bookshop, 6.99, 4.3, 2, 2
9781787460966, Lots of books, 5.99, 4.4, 1, 1
9781838770013, A Bookshop, 6.99, 3.8, 1, 1
9780008305796, The bookseller, 13.99, 4.1, 2, 3
9780008305796, Lots of books, 16.99, 4.3, 3, 2
This program does not depend on the type of data structure the .csv is loaded into. It could be a List, List of Lists, Collection etc.
Upvotes: 1
Views: 99
Reputation: 11357
You /could/ do it in a single pass, the code would look something like so:
Map<String, IsbnData> dataStore = new HashMap();
forEach(row : rows) {
IsbnData datum = dataStore.get(row[0]); //or whatever the index of ISBN is
if(datum == null) {
datum = createIsbnDataFromRow(row);
} else {
datum = updateDatumWithMoreData(datum, row);
}
dataStore.put(row[0], datum);
}
The main benefit of this is that instead of having to deal with String[]
you have nicely structured classes and the code is easier to read.
The code /may/ run faster, but that's probably irrelevant since it's much more likely to run out of memory before the speed matters. (Don't confuse this with the program being slow - it may well be slow, but that is due to reading / parsing the CSV files. The speed gain from passing over the CSV files less times after you've parsed them is negligable).
Upvotes: 1