Reputation: 99
Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as
aaa.txt:
100110,StringA,22
200110,StringB,2
300110,StringC, 12
400110,StringD,34
500110,StringE,423
bbb.txt as:
100110,StringA,20.1
200110,StringB,2.1
300110,StringC, 12.2
400110,StringD,3.2
500110,StringE,42.1
and ccc.txt as:
100110,StringA,2.1
200110,StringB,2.1
300110,StringC, 11
400110,StringD,3.2
500110,StringE,4.1
Now I have to read all the three files (huge files) and report the result as 100110: (22, 20.1,2.1). Issue is with the size of files and how to achieve this in optimized way.
Upvotes: 0
Views: 102
Reputation: 5937
I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.
The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.
HashMap<String, String[]> map = new HashMap<>();
while (aaa.hasNextLine()) {
String[] lineContents = aaa.nextLine().split(",");
String[] array = new String[3];
array[0] = lineContents[2].trim();
map.put(lineContents[0], array);
}
while (bbb.hasNextLine()) {
String[] lineContents = bbb.nextLine().split(",");
String[] array = map.get(lineContents[0]);
if (array != null) {
array[1] = lineContents[2].trim();
map.put(lineContents[0], lineContents[2].trim());
} else {
array = new String[3];
array[1] = lineContents[2].trim();
map.put(lineContents[0], array);
}
}
// same for c, with a new index of 2
To add synchronicity, you would probably use one of these maps.
Then you'd create 3 threads that just read and put.
Upvotes: 1
Reputation: 53819
If your files are all ordered, simply maintain an array of Scanner
pointing to your files and read the lines one by one, output the result file in a file as you go.
Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.
If your files are not ordered, you can use the sort
command to sort them.
Upvotes: 0
Reputation: 1723
Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.
Upvotes: 0