Reputation: 109
I'm currently writing a program that takes in two CSVs - one containing database keys (and other information irrelevant to the current issue), the other being an asset manifest. The program checks the database key from the first CSV, queries an online database to retrieve the asset key, then gets the asset status from the second CSV. (This is a workaround to a stupid API issue.)
My problem is that while the CSV that is being iterated over is relatively short - only about 300 lines long usually - the other is an asset manifest that is easily 10000 lines long (and sorted, though not by the key I can obtain from the first CSV). I obviously don't want to iterate over the entire asset manifest for every single input line, since that will take roughly 10 eternities.
I'm a fairly inexperienced programmer, so I only know of sorting/searching algorithms, and I definitely don't know what would be the one to use for this. What algorithm would be the most efficient? Is there a way to "batch-query" the manifest for all of the assets listed in the input CSV that would be faster than searching the manifest individually for each key? Or should I use a tree or hashtable or something else I heard mentioned in other SE threads? I don't know anything about the performance implications of any of these.
I can format the manifest as needed when it's input (it's just copy-pasted into a GUI), so I guess I could iterate over the entire manifest when it's input and make a hashtable of key:line pairs and then search that? Or I could turn it into a 2D array and just search the specified index? Those are all I can think of.
Problem is, I don't know how much time computer operations like that take, and if that would just waste time or actually improve performance.
P.s. I'm using Java for this currently since it's all I know, but if another language would be faster then I'm all ears.
Upvotes: 0
Views: 334
Reputation: 2252
The simple solution will be creating a HashMap
, iterating through one of the files and add each line of that file to the HashMap
(with corresponding key and value), then iterate through the other one and see if the created HashMap
contains the key, if yes add the data to anotherHashMap
, then after iteration return the second HashMap
.
Imagine we have test1.csv
file with the content such key,name,family
as below:
5000,ehsan,tashkhisi
2,ali,lllll
3,amel,lllll
1,azio,skkk
And test2.csv
file with the content such key,status
like below:
1000,status1
1,status2
5000,status3
4000,status4
4001,status1
4002,status3
5,status1
We want to have output like this:
1 -> status2
5000 -> status3
Simple code will be like below:
Java 8 Stream:
private static Map<String, String> findDataInTwoFilesJava8() throws IOException {
Map<String, String> map =
Files.lines(Paths.get("/tmp/test2.csv")).map(a -> a.split(","))
.collect(Collectors.toMap((a -> a[0]), (a -> a[1])));
return Files.lines((Paths.get("/tmp/test1.csv"))).map(a -> a.split(","))
.filter(a -> map.containsKey(a[0]))
.collect(Collectors.toMap(a -> a[0], a -> map.get(a[0])));
}
Simple Java:
private static Map<String, String> findDataInTwoFiles() throws IOException {
String line;
Map<String, String> map = new HashMap<>();
BufferedReader br = new BufferedReader(new FileReader("/tmp/test2.csv"));
while ((line = br.readLine()) != null) {
String[] lienData = line.split(",");
map.put(lienData[0], lienData[1]);
}
Map<String, String> resultMap = new HashMap<>();
br = new BufferedReader(new FileReader("/tmp/test1.csv"));
while ((line = br.readLine()) != null) {
String key = line.split(",")[0];
if(map.containsKey(key))
resultMap.put(key, map.get(key));
}
return resultMap;
}
Upvotes: 1