DuskFall
DuskFall

Reputation: 422

Merging two files line by line Java

Is there a more efficient way than i'm currently using, to merge two files line by line appending the line from file2 onto file1?

If file1 contains

a1
b1
c1

And file2 contains

a2
b2
c2

Then the output file should contain

a1,a2
b1,b2
c1,c2

The current combineRecords method looks like

private FileSheet combineRecords(ArrayList<FileSheet> toCombine) throws IOException
{
    ArrayList<String> filepaths = new ArrayList<String>();

    for (FileSheet sheetIterator : toCombine)
    {
        filepaths.add(sheetIterator.filepath);
    }

    String filepathAddition = "";

    for (String s : filepaths)
    {
        filepathAddition = filepathAddition + s.split(".select.")[1].replace(".csv", "")  + ".";
    }

    String outputFilepath = subsheetDirectory + fileHandle.getName().split(".csv")[0] + ".select." + filepathAddition +  "csv";

    Log.log("Output filepath "  + outputFilepath);

    long mainFileLength = toCombine.get(0).recordCount();

    for (FileSheet f : toCombine)
    {
        int ordinal = toCombine.indexOf(f);

        if (toCombine.get(ordinal).recordCount() != mainFileLength)
        {
            Log.log("Error : Record counts for 0 + " + ordinal);
            return null;
        }
    }

    FileSheet finalValues;

    Log.log("Starting iteration streams");
    BufferedWriter out = new BufferedWriter(new FileWriter(outputFilepath, false));

    List<BufferedReader> streams = new ArrayList<>();
    for (FileSheet j : toCombine)
    {
        streams.add(new BufferedReader(new FileReader(j.filepath)));
    }

    String finalWrite = "";

    for (int i = 0; i < toCombine.get(0).recordCount(); i++)
    {

        for (FileSheet j : toCombine)
        {
            int ordinal = toCombine.indexOf(j);

            finalWrite = finalWrite + streams.get(ordinal).readLine();

            if (toCombine.indexOf(j) != toCombine.size() - 1)
            {
                finalWrite = finalWrite + ",";
            }
            else
            {
                finalWrite = finalWrite + "\n";
            }
        }

        if (i % 1000 == 0 || i == toCombine.get(0).recordCount() - 1)
        {
            // out.write(finalWrite + "\n");
            Files.write(Paths.get(outputFilepath),(finalWrite).getBytes(),StandardOpenOption.APPEND);

            finalWrite = "";
        }           
    }
    out.close();


    Log.log("Finished combineRecords");

    finalValues = new FileSheet(outputFilepath,0);
    return finalValues;
}

I've tried both bufferedwriters and files.write, and they have similar times to create file3, both in the 1:30 minute range, but i'm not sure if the bottleneck is at reading or writing

The sample files i'm using are currently at 36,000 records, but the actual file i'll be using is ~650,000 so taking (if it scales linearly) 1625 seconds is completely unfeasible for this operation

Edit : I've modified the code to only open files once, rather than per iteration, however i'm now getting stream closed when skipping to the nth line I thought that by doing streams.get(ordinal).skip(i).findFirst().get(); would return a new stream instead of skipping then closing the stream

Edit 2 : Modified the code to use bufferedreaders instead of streams, and write to file every 1000 lines read, and thats determined that the bottleneck is reading, because it still takes ~1:30 to do

Upvotes: 3

Views: 524

Answers (1)

Vasif
Vasif

Reputation: 778

First of all concating string using + operator is ok when it is not under loop. But when you want to merge strings in a loop you should use StringBuilder for better performance.

Second thing which you can improve you can write to file at the end like:

StringBuilder finalWrite = new StringBuilder();
for (int i = 0; i < toCombine.get(0).recordCount(); i++)
{

    for (FileSheet j : toCombine)
    {
        int ordinal = toCombine.indexOf(j);

        finalWrite.append(streams.get(ordinal).readLine());

        if (toCombine.indexOf(j) != toCombine.size() - 1)
        {
            finalWrite.append(",");
        }
        else
        {
            finalWrite.append("\n");
        }
    }           
}

Files.write(Paths.get(outputFilepath), finalWrite.toString().getBytes());

Upvotes: 1

Related Questions