Vinay Pandey
Vinay Pandey

Reputation: 1179

Split and join back a binary file in java

I am trying to divide a binary file (like video/audio/image) into chunks of 100kb each and then join those chunks back to get back the original file. My code seems to be working, in the sense that it divides the file and joins the chunks, the file I get back is of the same size as original. However, the problem is that the contents get truncated - that is, if it's a video file it stops after 2 seconds, if it is image file then only the upper part looks correct.

Here is the code I am using (I can post the entire code if you like):

For dividing:

File ifile = new File(fname); 
FileInputStream fis;
String newName;
FileOutputStream chunk;
int fileSize = (int) ifile.length();
int nChunks = 0, read = 0, readLength = Chunk_Size;
byte[] byteChunk;
try {
    fis = new FileInputStream(ifile);
    StupidTest.size = (int)ifile.length();
    while (fileSize > 0) {
        if (fileSize <= Chunk_Size) {
            readLength = fileSize;
        }
        byteChunk = new byte[readLength];
        read = fis.read(byteChunk, 0, readLength);
        fileSize -= read;
        assert(read==byteChunk.length);
        nChunks++;
        newName = fname + ".part" + Integer.toString(nChunks - 1);
        chunk = new FileOutputStream(new File(newName));
        chunk.write(byteChunk);
        chunk.flush();
        chunk.close();
        byteChunk = null;
        chunk = null;
    }
    fis.close();
    fis = null;

And for joining file, I put the names of all chunks in a List, then sort it by name and then run the following code:

File ofile = new File(fname);
FileOutputStream fos;
FileInputStream fis;
byte[] fileBytes;
int bytesRead = 0;
try {
    fos = new FileOutputStream(ofile,true);             
    for (File file : files) {
        fis = new FileInputStream(file);
        fileBytes = new byte[(int) file.length()];
        bytesRead = fis.read(fileBytes, 0,(int)  file.length());
        assert(bytesRead == fileBytes.length);
        assert(bytesRead == (int) file.length());
        fos.write(fileBytes);
        fos.flush();
        fileBytes = null;
        fis.close();
        fis = null;
    }
    fos.close();
    fos = null;

Upvotes: 21

Views: 22465

Answers (8)

Reza Dizaji
Reza Dizaji

Reputation: 53

I wrote a Kotlin version of original "splitting a file" part of code. I didn't solve the above mentioned problem with file names because I'm uploading the chunks in order to drive API and I think that's a problem for the Drive API to solve how to glue chunks back together.

Anyway, since it took me a while to figure out what exactly happening because lack of documentation on original code, I thought why not help others understand it better.

Also as a final note I wanna add the fact that I modified the code a little bit to be a function and return a list of files.

Here's the code:

private fun splitFile(file: File, chunkSize: Int = 524288): List<File> {
    val resultedFiles = mutableListOf<File>()

    // This will read file data for us
    val inputStream = FileInputStream(file)

    // At first, all file size is remained to be processed
    var remainingDataSize = file.length()
    // We have 0 chunks at first
    var nChunks = 0
    // Every chunk size is the requested size except for the last chunk
    var currentChunkSize = chunkSize

    // While data is remained to be read
    while (remainingDataSize > 0) {
        // If remaining data size is less than chunk size then change last chunk size to remaining size
        if (remainingDataSize <= chunkSize) {
            currentChunkSize = remainingDataSize.toInt()
        }

        // Initialize a byteArray for our chunk data
        val byteChunk = ByteArray(currentChunkSize)

        /** Read data from file to our byteChunk as much as needed.
         * We don't have to specify an $offset parameter for this method
         * because we initialized inputStream before the loop and it remembers
         * to which point it has read the data.*/
        val read = inputStream.read(byteChunk, 0, currentChunkSize)
        // Subtract the read data size from remainingDataSize
        remainingDataSize -= read

        // Assert that we read as much as calculated.
        assert(read == byteChunk.size) { "There was a problem in chunk size calculations or reading process." }

        // We are going to add a chunk to our resulted
        nChunks++

        // Initialize new file
        val newFileName = file.name + ".part" + nChunks
        val newFile = File(file.parent, newFileName)

        try {
            // Write data to our new file
            FileOutputStream(newFile).run {
                write(byteChunk)
                flush()
                close()
            }
        } catch (e: Exception) {
            if (e is FileNotFoundException) Log.e(
                TAG,
                "splitFile: File not found: ${e.message}",
                e
            )
            else if (e is SecurityException) Log.e(
                TAG,
                "splitFile: Security error: ${e.message}",
                e
            )
        }
        // Add the new file to result list
        resultedFiles.add(newFile)
    }
    inputStream.close()

    return resultedFiles
}

Also I think it worth mentioning that it's better to delete the files after using to prevent your app's size getting large for no reason. Every file object has a .delete() method.

Upvotes: 0

18446744073709551615
18446744073709551615

Reputation: 16832

public class FileSplitter {
    private static final int BUFSIZE = 4*1024;
    public boolean needsSplitting(String file, int chunkSize) {
        return new File(file).length() > chunkSize;
    }
    private static boolean isASplitFileChunk(String file) {
        return chunkIndexLen(file) > 0;
    }
    private static int chunkIndexLen(String file) {
        int n = numberOfTrailingDigits(file);
        if (n > 0) {
            String zeroes = new String(new char[n]).replace("\0", "0");
            if (file.matches(".*\\.part[0-9]{"+n+"}?of[0-9]{"+n+"}?$") && !file.endsWith(zeroes) && !chunkNumberStr(file, n).equals(zeroes)) {
                return n;
            }
        }
        return 0;
    }
    private static String getWholeFileName(String chunkName) {
        int n = chunkIndexLen(chunkName);
        if (n>0) {
            return chunkName.substring(0, chunkName.length() - 7 - 2*n); // 7+2n: 1+4+n+2+n : .part012of345
        }
        return chunkName;
    }
    private static int getNumberOfChunks(String filename) {
        int n = chunkIndexLen(filename);
        if (n > 0) {
            try {
                String digits = chunksTotalStr(filename, n);
                return Integer.parseInt(digits);
            } catch (NumberFormatException x) { // should never happen
            }
        }
        return 1;
    }
    private static int getChunkNumber(String filename) {
        int n = chunkIndexLen(filename);
        if (n > 0) {
            try {
                // filename.part001of200
                String digits = chunkNumberStr(filename, n);
                return Integer.parseInt(digits)-1;
            } catch (NumberFormatException x) {
            }
        }
        return 0;
    }
    private static int numberOfTrailingDigits(String s) {
        int n=0, l=s.length()-1;
        while (l>=0 && Character.isDigit(s.charAt(l))) {
            n++; l--;
        }
        return n;
    }
    private static String chunksTotalStr(String filename, int chunkIndexLen) {
        return filename.substring(filename.length()-chunkIndexLen);
    }
    protected static String chunkNumberStr(String filename, int chunkIndexLen) {
        int p = filename.length() - 2 - 2*chunkIndexLen; // 123of456
        return filename.substring(p,p+chunkIndexLen);
    }
    // 0,8 ==> part1of8; 7,8 ==> part8of8
    private static String chunkFileName(String filename, int n, int total, int chunkIndexLength) {
        return filename+String.format(".part%0"+chunkIndexLength+"dof%0"+chunkIndexLength+"d", n+1, total);
    }
    public static String[] splitFile(String fname, long chunkSize) throws IOException {
        FileInputStream fis = null;
        ArrayList<String> res = new ArrayList<String>();
        byte[] buffer = new byte[BUFSIZE];
        try {
            long totalSize = new File(fname).length();
            int nChunks = (int) ((totalSize + chunkSize - 1) / chunkSize);
            int chunkIndexLength = String.format("%d", nChunks).length();
            fis = new FileInputStream(fname);
            long written = 0;
            for (int i=0; written<totalSize; i++) {
                String chunkFName = chunkFileName(fname, i, nChunks, chunkIndexLength);
                FileOutputStream fos = new FileOutputStream(chunkFName);
                try {
                    written += copyStream(fis, buffer, fos, chunkSize);
                } finally {
                    Closer.closeSilently(fos);
                }
                res.add(chunkFName);
            }
        } finally {
            Closer.closeSilently(fis);
        }
        return res.toArray(new String[0]);
    }
    public static boolean canJoinFile(String chunkName) {
        int n = chunkIndexLen(chunkName);
        if (n>0) {
            int nChunks = getNumberOfChunks(chunkName);
            String filename = getWholeFileName(chunkName);
            for (int i=0; i<nChunks; i++) {
                if (!new File(chunkFileName(filename, i, nChunks, n)).exists()) {
                    return false;
                }
            }
            return true;
        }
        return false;
    }
    public static void joinChunks(String chunkName) throws IOException {
        int n = chunkIndexLen(chunkName);
        if (n>0) {
            int nChunks = getNumberOfChunks(chunkName);
            String filename = getWholeFileName(chunkName);
            byte[] buffer = new byte[BUFSIZE];
            FileOutputStream fos = new FileOutputStream(filename);
            try {
                for (int i=0; i<nChunks; i++) {
                    FileInputStream fis = new FileInputStream(chunkFileName(filename, i, nChunks, n));
                    try {
                        copyStream(fis, buffer, fos, -1);
                    } finally {
                        Closer.closeSilently(fis);
                    }
                }
            } finally {
                Closer.closeSilently(fos);
            }
        }
    }
    public static boolean deleteAllChunks(String chunkName) {
        boolean res = true;
        int n = chunkIndexLen(chunkName);
        if (n>0) {
            int nChunks = getNumberOfChunks(chunkName);
            String filename = getWholeFileName(chunkName);
            for (int i=0; i<nChunks; i++) {
                File f = new File(chunkFileName(filename, i, nChunks, n));
                res &= (f.delete() || !f.exists());
            }
        }
        return res;
    }
    private static long copyStream(FileInputStream fis, byte[] buffer, FileOutputStream fos, long maxAmount) throws IOException {
        long chunkSizeWritten;
        for (chunkSizeWritten=0; chunkSizeWritten<maxAmount || maxAmount<0; ) {
            int toRead = maxAmount < 0 ? buffer.length : (int)Math.min(buffer.length, maxAmount - chunkSizeWritten);
            int lengthRead = fis.read(buffer, 0, toRead);
            if (lengthRead < 0) {
                break;
            }
            fos.write(buffer, 0, lengthRead);
            chunkSizeWritten += lengthRead;
        }
        return chunkSizeWritten;
    }
}

Borrow Closer here or from org.apache.logging.log4j.core.util.

Upvotes: 3

Devendra
Devendra

Reputation: 1

It takes split file name & destination file size(in byte) form user and split it into subfiles its working for all type of files like(.bin,.jpg,.rar)

import java.io.*;
class split{
public static void main(String args[])throws IOException {
String a;
int b;
long len;
Console con=System.console();
System.out.println("Enter File Name: ");
File f=new File(con.readLine());
System.out.println("Enter Destination File Size: ");  
b=Integer.parseInt(con.readLine());
FileInputStream fis=new FileInputStream(f);
len=f.length();
int c=(int)len/b;
if(((int)len%b)!=0)
c++;
for(int i=0;i<c;i++){
File f1=new File(i+""+"."+f);
FileOutputStream fos=new FileOutputStream(f1);
for(int j=0;j<b;j++){   
int ch;
if((ch=fis.read())!=-1)
fos.write(ch); } }
fis.close();
System.out.println("Operation Successful"); }}

and another program will merge all the split files.It take only split file name and merge all the files.

import java.io.*;
class merge{
static int i;
public static void main(String args[])throws IOException{
String a;
int b;
long len;
Console con=System.console();
System.out.println("Enter File to be retrived: ");
File f=new File(con.readLine());
FileOutputStream fos=new FileOutputStream(f,true);
try {
File f1=new File(i+""+"."+f);
while((f1.exists())!=false) {
int ch;
FileInputStream fis=new FileInputStream(i+""+"."+f);
i++;
while((ch=fis.read())!=-1){
fos.write(ch);  }}}
catch(FileNotFoundException e1){} }}

Upvotes: 0

Manglesh pareek
Manglesh pareek

Reputation: 11

For splitting the file:----->

import java.io.*;

class Split
{


  public static void main(String args[])throws IOException
   {

    Console con=System.console();
    System.out.println("enter the file name");
    String path=con.readLine();
    File f= new File(path);
    int filesize=(int)f.length();
    FileInputStream fis= new FileInputStream(path); 

    int size;
    System.out.println("enter file size for split");
        size=Integer.parseInt(con.readLine());


    byte b[]=new byte[size];

    int ch,c=0;




    while(filesize>0)
           {
                 ch=fis.read(b,0,size);


        filesize = filesize-ch;


                String fname=c+"."+f.getName()+"";
        c++;
        FileOutputStream fos= new FileOutputStream(new File(fname));
        fos.write(b,0,ch);
        fos.flush();
        fos.close();

        }

fis.close();

}

}

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533492

What happens when you do a binary comparison of the files. e.g. with diff. Do you see a difference after the first file?

Can you try breaking up a text TXT file? if there are bytes are out of place it should be more obvious what is going wrong. e.g. a repeated block/file/or data full of nul bytes. ??

EDIT: As others have noticed, you read the files in no particular order. What you can do is use a padded file number like.

newName = String.format("%s.part%09d", fname, nChunks - 1);

This will give you up to 1 billion files in numeric order.

When you read the files, you need to ensure they are sorted.

Arrays.sort(files);
for (File file : files) {

Using a custom comparator as others have suggest would reduce the size of the padded numbers but it can be nice to be able to sort by name to get the correct order. e.g. in explorer.

Upvotes: 0

Haderlump
Haderlump

Reputation: 126

Are there more than 10 chunks? Then the program will concatenate *.part1 + *.part10 + *.part2 and so on.

Upvotes: 1

BalusC
BalusC

Reputation: 1108712

I can spot only 2 potential mistakes in the code:

int fileSize = (int) ifile.length();

The above fails when the file is over 2GB since an int cannot hold more.

newName = fname + ".part" + Integer.toString(nChunks - 1);

A filename which is constructed like that should be sorted on a very specific manner. When using default string sorting, name.part10 will namely come before name.part2. You'd like to supply a custom Comparator which extracts and parses the part number as an int and then compare by that instead.

Upvotes: 12

Karl Knechtel
Karl Knechtel

Reputation: 61509

And for joining file, I put the names of all chunks in a List, then sort it by name and then run the following code:

But your names are of the following form:

newName = fname + ".part" + Integer.toString(nChunks - 1);

Think carefully about what happens if you have 11 or more parts. Which string comes first in alphabetical order: ".part10" or ".part2"? (Answer: ".part10", since '1' comes before '2' in the character encoding.)

Upvotes: 4

Related Questions