Reputation: 422
Is there a more efficient way than i'm currently using, to merge two files line by line appending the line from file2 onto file1?
If file1 contains
a1
b1
c1
And file2 contains
a2
b2
c2
Then the output file should contain
a1,a2
b1,b2
c1,c2
The current combineRecords method looks like
private FileSheet combineRecords(ArrayList<FileSheet> toCombine) throws IOException
{
ArrayList<String> filepaths = new ArrayList<String>();
for (FileSheet sheetIterator : toCombine)
{
filepaths.add(sheetIterator.filepath);
}
String filepathAddition = "";
for (String s : filepaths)
{
filepathAddition = filepathAddition + s.split(".select.")[1].replace(".csv", "") + ".";
}
String outputFilepath = subsheetDirectory + fileHandle.getName().split(".csv")[0] + ".select." + filepathAddition + "csv";
Log.log("Output filepath " + outputFilepath);
long mainFileLength = toCombine.get(0).recordCount();
for (FileSheet f : toCombine)
{
int ordinal = toCombine.indexOf(f);
if (toCombine.get(ordinal).recordCount() != mainFileLength)
{
Log.log("Error : Record counts for 0 + " + ordinal);
return null;
}
}
FileSheet finalValues;
Log.log("Starting iteration streams");
BufferedWriter out = new BufferedWriter(new FileWriter(outputFilepath, false));
List<BufferedReader> streams = new ArrayList<>();
for (FileSheet j : toCombine)
{
streams.add(new BufferedReader(new FileReader(j.filepath)));
}
String finalWrite = "";
for (int i = 0; i < toCombine.get(0).recordCount(); i++)
{
for (FileSheet j : toCombine)
{
int ordinal = toCombine.indexOf(j);
finalWrite = finalWrite + streams.get(ordinal).readLine();
if (toCombine.indexOf(j) != toCombine.size() - 1)
{
finalWrite = finalWrite + ",";
}
else
{
finalWrite = finalWrite + "\n";
}
}
if (i % 1000 == 0 || i == toCombine.get(0).recordCount() - 1)
{
// out.write(finalWrite + "\n");
Files.write(Paths.get(outputFilepath),(finalWrite).getBytes(),StandardOpenOption.APPEND);
finalWrite = "";
}
}
out.close();
Log.log("Finished combineRecords");
finalValues = new FileSheet(outputFilepath,0);
return finalValues;
}
I've tried both bufferedwriters and files.write, and they have similar times to create file3, both in the 1:30 minute range, but i'm not sure if the bottleneck is at reading or writing
The sample files i'm using are currently at 36,000 records, but the actual file i'll be using is ~650,000 so taking (if it scales linearly) 1625 seconds is completely unfeasible for this operation
Edit : I've modified the code to only open files once, rather than per iteration, however i'm now getting stream closed when skipping to the nth line
I thought that by doing streams.get(ordinal).skip(i).findFirst().get();
would return a new stream instead of skipping then closing the stream
Edit 2 : Modified the code to use bufferedreaders instead of streams, and write to file every 1000 lines read, and thats determined that the bottleneck is reading, because it still takes ~1:30 to do
Upvotes: 3
Views: 524
Reputation: 778
First of all concating string using +
operator is ok when it is not under loop. But when you want to merge strings in a loop you should use StringBuilder
for better performance.
Second thing which you can improve you can write to file at the end like:
StringBuilder finalWrite = new StringBuilder();
for (int i = 0; i < toCombine.get(0).recordCount(); i++)
{
for (FileSheet j : toCombine)
{
int ordinal = toCombine.indexOf(j);
finalWrite.append(streams.get(ordinal).readLine());
if (toCombine.indexOf(j) != toCombine.size() - 1)
{
finalWrite.append(",");
}
else
{
finalWrite.append("\n");
}
}
}
Files.write(Paths.get(outputFilepath), finalWrite.toString().getBytes());
Upvotes: 1