Logicalj
Logicalj

Reputation: 99

Read data from multiple files and apply business logic

Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as

aaa.txt:

100110,StringA,22
200110,StringB,2
300110,StringC, 12
400110,StringD,34
500110,StringE,423

bbb.txt as:

100110,StringA,20.1
200110,StringB,2.1  
300110,StringC, 12.2
400110,StringD,3.2
500110,StringE,42.1

and ccc.txt as:

100110,StringA,2.1
200110,StringB,2.1  
300110,StringC, 11
400110,StringD,3.2
500110,StringE,4.1

Now I have to read all the three files (huge files) and report the result as 100110: (22, 20.1,2.1). Issue is with the size of files and how to achieve this in optimized way.

Upvotes: 0

Views: 102

Answers (3)

Compass
Compass

Reputation: 5937

I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.

The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.

    HashMap<String, String[]> map = new HashMap<>();

    while (aaa.hasNextLine()) {
        String[] lineContents = aaa.nextLine().split(",");
        String[] array = new String[3];
        array[0] = lineContents[2].trim();
        map.put(lineContents[0], array);
    }

    while (bbb.hasNextLine()) {
        String[] lineContents = bbb.nextLine().split(",");
        String[] array = map.get(lineContents[0]);
        if (array != null) {
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], lineContents[2].trim());
        } else {
            array = new String[3];
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], array);
        }
    }

    // same for c, with a new index of 2

To add synchronicity, you would probably use one of these maps.

Then you'd create 3 threads that just read and put.

Upvotes: 1

Jean Logeart
Jean Logeart

Reputation: 53819

If your files are all ordered, simply maintain an array of Scanner pointing to your files and read the lines one by one, output the result file in a file as you go.

Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.

If your files are not ordered, you can use the sort command to sort them.

Upvotes: 0

PeterK
PeterK

Reputation: 1723

Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.

Upvotes: 0

Related Questions