Iurii
Iurii

Reputation: 1795

Split large file into chunks

I have a method which accept file and size of chunks and return list of chunked files. But the main problem that my line in file could be broken, for example in main file I have next lines:

|1|aaa|bbb|ccc|
|2|ggg|ddd|eee|

After split I could have in one file:

|1|aaa|bbb

In another file:

|ccc|2|
|ggg|ddd|eee|

Here is the code:

public static List<File> splitFile(File file, int sizeOfFileInMB) throws    IOException {
  int counter = 1;
  List<File> files = new ArrayList<>();

  int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
  byte[] buffer = new byte[sizeOfChunk];

  try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
    String name = file.getName();

    int tmp = 0;
    while ((tmp = bis.read(buffer)) > 0) {
        File newFile = new File(file.getParent(), name + "."
                + String.format("%03d", counter++));
        try (FileOutputStream out = new FileOutputStream(newFile)) {
            out.write(buffer, 0, tmp);
        }

        files.add(newFile);
    }
  }

  return files;
}

Should I use RandomAccessFile class for above purposes (main file is really big - more then 5 Gb)?

Upvotes: 10

Views: 24072

Answers (4)

prem
prem

Reputation: 117

Split files in chunks depending upon your chunk size

                  val f = FileInputStream(file)
                  val data = ByteArray(f.available()) // Size of original file
                  var subData: ByteArray
                  f.read(data)
                  var start = 0
                  var end = CHUNK_SIZE
                  val max = data.size
                  if (max > 0) {
                      while (end < max) {
                          subData = data.copyOfRange(start, end)
                          start = end
                          end += CHUNK_SIZE
                          if (end >= max) {
                              end = max
                          }
                         //Function to upload your chunk
                          uploadFileInChunk(subData, isLast = false)
                      }
                      // For the Last Chunk
                      end--
                      subData = data.copyOfRange(start, end)
                      uploadFileInChunk(subData, isLast = true)
                  }

If you are taking the file from the user through intent you may get file URI as content, so in that case.

  Uri uri = data.getData();
  InputStream inputStream = getContext().getContentResolver().openInputStream(uri);
            fileInBytes = IOUtils.toByteArray(inputStream);

Add the dependency in you build gradle to use IOUtils

 compile 'commons-io:commons-io:2.11.0'

Now do a little modification in the above code to send your file to server.

         var subData: ByteArray
         var start = 0
         var end = CHUNK_SIZE
         val max = fileInBytes.size
         if (max > 0) {
             while (end < max) {
                 subData = fileInBytes.copyOfRange(start, end)
                 start = end
                 end += CHUNK_SIZE
                 if (end >= max) {
                     end = max
                 }
                 uploadFileInChunk(subData, isLast = false)
             }
             // For the Last Chunk
             end--
             subData = fileInBytes.copyOfRange(start, end)
             uploadFileInChunk(subData, isLast = true)
         }
     

Upvotes: 0

Seb
Seb

Reputation: 21

Just in case anyone is interested in a Kotlin version. It creates an iterator of ByteArray chunks:

    class ByteArrayReader(val input: InputStream, val chunkSize: Int, val bufferSize: Int = 1024*8): Iterator<ByteArray> {
    
        var eof: Boolean = false
    
        init {
            if ((chunkSize % bufferSize) != 0) {
                throw RuntimeException("ChunkSize(${chunkSize}) should be a multiple of bufferSize (${bufferSize})")
            }
        }
        override fun hasNext(): Boolean = !eof
    
        override fun next(): ByteArray {
            var buffer = ByteArray(bufferSize)
            var chunkWriter = ByteArrayOutputStream(chunkSize) // no need to close - implementation is empty
            var bytesRead = 0
            var offset = 0
            while (input.read(buffer).also { bytesRead = it } > 0) {
                if (chunkWriter.use { out ->
                            out.write(buffer, 0, bytesRead)
                            out.flush()
                            offset += bytesRead
                            offset == chunkSize
                        }) {
                    return chunkWriter.toByteArray()
                }
            }
            eof = true
            return chunkWriter.toByteArray()
        }
    
    }

Upvotes: 2

Ajith
Ajith

Reputation: 97

Split a file to multiple chunks (in memory operation), here I'm splitting any file to a size of 500kb(500000 bytes) and adding to a list :

public static List<ByteArrayOutputStream> splitFile(File f) {
    List<ByteArrayOutputStream> datalist = new ArrayList<>();
    try {

        int sizeOfFiles = 500000;
        byte[] buffer = new byte[sizeOfFiles];

        try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {

            int bytesAmount = 0;
            while ((bytesAmount = bis.read(buffer)) > 0) {
                try (OutputStream out = new ByteArrayOutputStream()) {
                    out.write(buffer, 0, bytesAmount);
                    out.flush();
                    datalist.add((ByteArrayOutputStream) out);
                }
            }
        }
    } catch (Exception e) {
        //get the error
    }

    return datalist;
}

Upvotes: 0

rsutormin
rsutormin

Reputation: 1649

If you don't mind to have chunks of different lengths (<=sizeOfChunk but closest to it) then here is the code:

public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
    int counter = 1;
    List<File> files = new ArrayList<File>();
    int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
    String eof = System.lineSeparator();
    try (BufferedReader br = new BufferedReader(new FileReader(file))) {
        String name = file.getName();
        String line = br.readLine();
        while (line != null) {
            File newFile = new File(file.getParent(), name + "."
                    + String.format("%03d", counter++));
            try (OutputStream out = new BufferedOutputStream(new FileOutputStream(newFile))) {
                int fileSize = 0;
                while (line != null) {
                    byte[] bytes = (line + eof).getBytes(Charset.defaultCharset());
                    if (fileSize + bytes.length > sizeOfChunk)
                        break;
                    out.write(bytes);
                    fileSize += bytes.length;
                    line = br.readLine();
                }
            }
            files.add(newFile);
        }
    }
    return files;
}

The only problem here is file charset which is default system charset in this example. If you want to be able to change it let me know. I'll add third parameter to "splitFile" function for it.

Upvotes: 13

Related Questions