Reputation: 1179
I am trying to divide a binary file (like video/audio/image) into chunks of 100kb each and then join those chunks back to get back the original file. My code seems to be working, in the sense that it divides the file and joins the chunks, the file I get back is of the same size as original. However, the problem is that the contents get truncated - that is, if it's a video file it stops after 2 seconds, if it is image file then only the upper part looks correct.
Here is the code I am using (I can post the entire code if you like):
For dividing:
File ifile = new File(fname);
FileInputStream fis;
String newName;
FileOutputStream chunk;
int fileSize = (int) ifile.length();
int nChunks = 0, read = 0, readLength = Chunk_Size;
byte[] byteChunk;
try {
fis = new FileInputStream(ifile);
StupidTest.size = (int)ifile.length();
while (fileSize > 0) {
if (fileSize <= Chunk_Size) {
readLength = fileSize;
}
byteChunk = new byte[readLength];
read = fis.read(byteChunk, 0, readLength);
fileSize -= read;
assert(read==byteChunk.length);
nChunks++;
newName = fname + ".part" + Integer.toString(nChunks - 1);
chunk = new FileOutputStream(new File(newName));
chunk.write(byteChunk);
chunk.flush();
chunk.close();
byteChunk = null;
chunk = null;
}
fis.close();
fis = null;
And for joining file, I put the names of all chunks in a List, then sort it by name and then run the following code:
File ofile = new File(fname);
FileOutputStream fos;
FileInputStream fis;
byte[] fileBytes;
int bytesRead = 0;
try {
fos = new FileOutputStream(ofile,true);
for (File file : files) {
fis = new FileInputStream(file);
fileBytes = new byte[(int) file.length()];
bytesRead = fis.read(fileBytes, 0,(int) file.length());
assert(bytesRead == fileBytes.length);
assert(bytesRead == (int) file.length());
fos.write(fileBytes);
fos.flush();
fileBytes = null;
fis.close();
fis = null;
}
fos.close();
fos = null;
Upvotes: 21
Views: 22465
Reputation: 53
I wrote a Kotlin version of original "splitting a file" part of code. I didn't solve the above mentioned problem with file names because I'm uploading the chunks in order to drive API and I think that's a problem for the Drive API to solve how to glue chunks back together.
Anyway, since it took me a while to figure out what exactly happening because lack of documentation on original code, I thought why not help others understand it better.
Also as a final note I wanna add the fact that I modified the code a little bit to be a function and return a list of files.
Here's the code:
private fun splitFile(file: File, chunkSize: Int = 524288): List<File> {
val resultedFiles = mutableListOf<File>()
// This will read file data for us
val inputStream = FileInputStream(file)
// At first, all file size is remained to be processed
var remainingDataSize = file.length()
// We have 0 chunks at first
var nChunks = 0
// Every chunk size is the requested size except for the last chunk
var currentChunkSize = chunkSize
// While data is remained to be read
while (remainingDataSize > 0) {
// If remaining data size is less than chunk size then change last chunk size to remaining size
if (remainingDataSize <= chunkSize) {
currentChunkSize = remainingDataSize.toInt()
}
// Initialize a byteArray for our chunk data
val byteChunk = ByteArray(currentChunkSize)
/** Read data from file to our byteChunk as much as needed.
* We don't have to specify an $offset parameter for this method
* because we initialized inputStream before the loop and it remembers
* to which point it has read the data.*/
val read = inputStream.read(byteChunk, 0, currentChunkSize)
// Subtract the read data size from remainingDataSize
remainingDataSize -= read
// Assert that we read as much as calculated.
assert(read == byteChunk.size) { "There was a problem in chunk size calculations or reading process." }
// We are going to add a chunk to our resulted
nChunks++
// Initialize new file
val newFileName = file.name + ".part" + nChunks
val newFile = File(file.parent, newFileName)
try {
// Write data to our new file
FileOutputStream(newFile).run {
write(byteChunk)
flush()
close()
}
} catch (e: Exception) {
if (e is FileNotFoundException) Log.e(
TAG,
"splitFile: File not found: ${e.message}",
e
)
else if (e is SecurityException) Log.e(
TAG,
"splitFile: Security error: ${e.message}",
e
)
}
// Add the new file to result list
resultedFiles.add(newFile)
}
inputStream.close()
return resultedFiles
}
Also I think it worth mentioning that it's better to delete the files after using to prevent your app's size getting large for no reason. Every file object has a .delete()
method.
Upvotes: 0
Reputation: 16832
public class FileSplitter {
private static final int BUFSIZE = 4*1024;
public boolean needsSplitting(String file, int chunkSize) {
return new File(file).length() > chunkSize;
}
private static boolean isASplitFileChunk(String file) {
return chunkIndexLen(file) > 0;
}
private static int chunkIndexLen(String file) {
int n = numberOfTrailingDigits(file);
if (n > 0) {
String zeroes = new String(new char[n]).replace("\0", "0");
if (file.matches(".*\\.part[0-9]{"+n+"}?of[0-9]{"+n+"}?$") && !file.endsWith(zeroes) && !chunkNumberStr(file, n).equals(zeroes)) {
return n;
}
}
return 0;
}
private static String getWholeFileName(String chunkName) {
int n = chunkIndexLen(chunkName);
if (n>0) {
return chunkName.substring(0, chunkName.length() - 7 - 2*n); // 7+2n: 1+4+n+2+n : .part012of345
}
return chunkName;
}
private static int getNumberOfChunks(String filename) {
int n = chunkIndexLen(filename);
if (n > 0) {
try {
String digits = chunksTotalStr(filename, n);
return Integer.parseInt(digits);
} catch (NumberFormatException x) { // should never happen
}
}
return 1;
}
private static int getChunkNumber(String filename) {
int n = chunkIndexLen(filename);
if (n > 0) {
try {
// filename.part001of200
String digits = chunkNumberStr(filename, n);
return Integer.parseInt(digits)-1;
} catch (NumberFormatException x) {
}
}
return 0;
}
private static int numberOfTrailingDigits(String s) {
int n=0, l=s.length()-1;
while (l>=0 && Character.isDigit(s.charAt(l))) {
n++; l--;
}
return n;
}
private static String chunksTotalStr(String filename, int chunkIndexLen) {
return filename.substring(filename.length()-chunkIndexLen);
}
protected static String chunkNumberStr(String filename, int chunkIndexLen) {
int p = filename.length() - 2 - 2*chunkIndexLen; // 123of456
return filename.substring(p,p+chunkIndexLen);
}
// 0,8 ==> part1of8; 7,8 ==> part8of8
private static String chunkFileName(String filename, int n, int total, int chunkIndexLength) {
return filename+String.format(".part%0"+chunkIndexLength+"dof%0"+chunkIndexLength+"d", n+1, total);
}
public static String[] splitFile(String fname, long chunkSize) throws IOException {
FileInputStream fis = null;
ArrayList<String> res = new ArrayList<String>();
byte[] buffer = new byte[BUFSIZE];
try {
long totalSize = new File(fname).length();
int nChunks = (int) ((totalSize + chunkSize - 1) / chunkSize);
int chunkIndexLength = String.format("%d", nChunks).length();
fis = new FileInputStream(fname);
long written = 0;
for (int i=0; written<totalSize; i++) {
String chunkFName = chunkFileName(fname, i, nChunks, chunkIndexLength);
FileOutputStream fos = new FileOutputStream(chunkFName);
try {
written += copyStream(fis, buffer, fos, chunkSize);
} finally {
Closer.closeSilently(fos);
}
res.add(chunkFName);
}
} finally {
Closer.closeSilently(fis);
}
return res.toArray(new String[0]);
}
public static boolean canJoinFile(String chunkName) {
int n = chunkIndexLen(chunkName);
if (n>0) {
int nChunks = getNumberOfChunks(chunkName);
String filename = getWholeFileName(chunkName);
for (int i=0; i<nChunks; i++) {
if (!new File(chunkFileName(filename, i, nChunks, n)).exists()) {
return false;
}
}
return true;
}
return false;
}
public static void joinChunks(String chunkName) throws IOException {
int n = chunkIndexLen(chunkName);
if (n>0) {
int nChunks = getNumberOfChunks(chunkName);
String filename = getWholeFileName(chunkName);
byte[] buffer = new byte[BUFSIZE];
FileOutputStream fos = new FileOutputStream(filename);
try {
for (int i=0; i<nChunks; i++) {
FileInputStream fis = new FileInputStream(chunkFileName(filename, i, nChunks, n));
try {
copyStream(fis, buffer, fos, -1);
} finally {
Closer.closeSilently(fis);
}
}
} finally {
Closer.closeSilently(fos);
}
}
}
public static boolean deleteAllChunks(String chunkName) {
boolean res = true;
int n = chunkIndexLen(chunkName);
if (n>0) {
int nChunks = getNumberOfChunks(chunkName);
String filename = getWholeFileName(chunkName);
for (int i=0; i<nChunks; i++) {
File f = new File(chunkFileName(filename, i, nChunks, n));
res &= (f.delete() || !f.exists());
}
}
return res;
}
private static long copyStream(FileInputStream fis, byte[] buffer, FileOutputStream fos, long maxAmount) throws IOException {
long chunkSizeWritten;
for (chunkSizeWritten=0; chunkSizeWritten<maxAmount || maxAmount<0; ) {
int toRead = maxAmount < 0 ? buffer.length : (int)Math.min(buffer.length, maxAmount - chunkSizeWritten);
int lengthRead = fis.read(buffer, 0, toRead);
if (lengthRead < 0) {
break;
}
fos.write(buffer, 0, lengthRead);
chunkSizeWritten += lengthRead;
}
return chunkSizeWritten;
}
}
Borrow Closer
here or from org.apache.logging.log4j.core.util.
Upvotes: 3
Reputation: 1
It takes split file name & destination file size(in byte) form user and split it into subfiles its working for all type of files like(.bin,.jpg,.rar)
import java.io.*;
class split{
public static void main(String args[])throws IOException {
String a;
int b;
long len;
Console con=System.console();
System.out.println("Enter File Name: ");
File f=new File(con.readLine());
System.out.println("Enter Destination File Size: ");
b=Integer.parseInt(con.readLine());
FileInputStream fis=new FileInputStream(f);
len=f.length();
int c=(int)len/b;
if(((int)len%b)!=0)
c++;
for(int i=0;i<c;i++){
File f1=new File(i+""+"."+f);
FileOutputStream fos=new FileOutputStream(f1);
for(int j=0;j<b;j++){
int ch;
if((ch=fis.read())!=-1)
fos.write(ch); } }
fis.close();
System.out.println("Operation Successful"); }}
and another program will merge all the split files.It take only split file name and merge all the files.
import java.io.*;
class merge{
static int i;
public static void main(String args[])throws IOException{
String a;
int b;
long len;
Console con=System.console();
System.out.println("Enter File to be retrived: ");
File f=new File(con.readLine());
FileOutputStream fos=new FileOutputStream(f,true);
try {
File f1=new File(i+""+"."+f);
while((f1.exists())!=false) {
int ch;
FileInputStream fis=new FileInputStream(i+""+"."+f);
i++;
while((ch=fis.read())!=-1){
fos.write(ch); }}}
catch(FileNotFoundException e1){} }}
Upvotes: 0
Reputation: 11
For splitting the file:----->
import java.io.*;
class Split
{
public static void main(String args[])throws IOException
{
Console con=System.console();
System.out.println("enter the file name");
String path=con.readLine();
File f= new File(path);
int filesize=(int)f.length();
FileInputStream fis= new FileInputStream(path);
int size;
System.out.println("enter file size for split");
size=Integer.parseInt(con.readLine());
byte b[]=new byte[size];
int ch,c=0;
while(filesize>0)
{
ch=fis.read(b,0,size);
filesize = filesize-ch;
String fname=c+"."+f.getName()+"";
c++;
FileOutputStream fos= new FileOutputStream(new File(fname));
fos.write(b,0,ch);
fos.flush();
fos.close();
}
fis.close();
}
}
Upvotes: 1
Reputation: 533492
What happens when you do a binary comparison of the files. e.g. with diff. Do you see a difference after the first file?
Can you try breaking up a text TXT file? if there are bytes are out of place it should be more obvious what is going wrong. e.g. a repeated block/file/or data full of nul bytes. ??
EDIT: As others have noticed, you read the files in no particular order. What you can do is use a padded file number like.
newName = String.format("%s.part%09d", fname, nChunks - 1);
This will give you up to 1 billion files in numeric order.
When you read the files, you need to ensure they are sorted.
Arrays.sort(files);
for (File file : files) {
Using a custom comparator as others have suggest would reduce the size of the padded numbers but it can be nice to be able to sort by name to get the correct order. e.g. in explorer.
Upvotes: 0
Reputation: 126
Are there more than 10 chunks? Then the program will concatenate *.part1 + *.part10 + *.part2 and so on.
Upvotes: 1
Reputation: 1108712
I can spot only 2 potential mistakes in the code:
int fileSize = (int) ifile.length();
The above fails when the file is over 2GB since an int
cannot hold more.
newName = fname + ".part" + Integer.toString(nChunks - 1);
A filename which is constructed like that should be sorted on a very specific manner. When using default string sorting, name.part10
will namely come before name.part2
. You'd like to supply a custom Comparator
which extracts and parses the part number as an int and then compare by that instead.
Upvotes: 12
Reputation: 61509
And for joining file, I put the names of all chunks in a List, then sort it by name and then run the following code:
But your names are of the following form:
newName = fname + ".part" + Integer.toString(nChunks - 1);
Think carefully about what happens if you have 11 or more parts. Which string comes first in alphabetical order: ".part10" or ".part2"? (Answer: ".part10", since '1' comes before '2' in the character encoding.)
Upvotes: 4