Reputation: 5931
In my current company, i am doing a PoC on how we can write a file downloader utility. We have to use socket programming(TCP/IP) for downloading the files. One of the requirements of the client is that a file(which will be large in size) should be transfered in chunks for example if we have a file of 5Mb size then we can have 5 threads which transfer 1 Mb each. I have written a small application which downloads a file. You can download the eclipe project
from http://www.fileflyer.com/view/QM1JSC0
A brief explanation of my classes
FileSender.java : This class provides the bytes of file. It has a method called sendBytesOfFile(long start,long end, long sequenceNo) which gives the number of bytes.
import java.io.File;
import java.io.IOException;
import java.util.zip.CRC32;
import org.apache.commons.io.FileUtils;
public class FileSender {
private static final String FILE_NAME = "C:\\shared\\test.pdf";
public ByteArrayWrapper sendBytesOfFile(long start,long end, long sequenceNo){
try {
File file = new File(FILE_NAME);
byte[] fileBytes = FileUtils.readFileToByteArray(file);
System.out.println("Size of file is " +fileBytes.length);
System.out.println();
System.out.println("Start "+start +" end "+end);
byte[] bytes = getByteArray(fileBytes, start, end);
ByteArrayWrapper wrapper = new ByteArrayWrapper(bytes, sequenceNo);
return wrapper;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private byte[] getByteArray(byte[] bytes, long start, long end){
long arrayLength = end-start;
System.out.println("Start : "+start +" end : "+end + " Arraylength : "+arrayLength +" length of source array : "+bytes.length);
byte[] arr = new byte[(int)arrayLength];
for(int i = (int)start, j =0; i < end;i++,j++){
arr[j] = bytes[i];
}
return arr;
}
public static long fileSize(){
File file = new File(FILE_NAME);
return file.length();
}
}
FileReceiver.java - This class receives the file.
Small Explanation what this file does
Code of File Receiver
package com.filedownloader;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.zip.CRC32;
import org.apache.commons.io.FileUtils;
public class FileReceiver {
public static void main(String[] args) {
FileReceiver receiver = new FileReceiver();
receiver.receiveFile();
}
public void receiveFile(){
long startTime = System.currentTimeMillis();
long numberOfThreads = 10;
long filesize = FileSender.fileSize();
System.out.println("File size received "+filesize);
long start = filesize/numberOfThreads;
List<ByteArrayWrapper> list = new ArrayList<ByteArrayWrapper>();
for(long threadCount =0; threadCount<numberOfThreads ;threadCount++){
FileDownloaderTask task = new FileDownloaderTask(threadCount*start,(threadCount+1)*start,threadCount,list);
new Thread(task).start();
}
while(list.size() != numberOfThreads){
// this is done so that all the threads should complete their work before processing further.
//System.out.println("Waiting for threads to complete. List size "+list.size());
}
if(list.size() == numberOfThreads){
System.out.println("All bytes received "+list);
Collections.sort(list, new Comparator<ByteArrayWrapper>() {
@Override
public int compare(ByteArrayWrapper o1, ByteArrayWrapper o2) {
long sequence1 = o1.getSequence();
long sequence2 = o2.getSequence();
if(sequence1 < sequence2){
return -1;
}else if(sequence1 > sequence2){
return 1;
}
else{
return 0;
}
}
});
byte[] totalBytes = list.get(0).getBytes();
byte[] firstArr = null;
byte[] secondArr = null;
for(int i = 1;i<list.size();i++){
firstArr = totalBytes;
secondArr = list.get(i).getBytes();
totalBytes = concat(firstArr, secondArr);
}
System.out.println(totalBytes.length);
convertToFile(totalBytes,"c:\\tmp\\test.pdf");
long endTime = System.currentTimeMillis();
System.out.println("Total time taken with "+numberOfThreads +" threads is "+(endTime-startTime)+" ms" );
}
}
private byte[] concat(byte[] A, byte[] B) {
byte[] C= new byte[A.length+B.length];
System.arraycopy(A, 0, C, 0, A.length);
System.arraycopy(B, 0, C, A.length, B.length);
return C;
}
private void convertToFile(byte[] totalBytes,String name) {
try {
FileUtils.writeByteArrayToFile(new File(name), totalBytes);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
Code of ByteArrayWrapper
package com.filedownloader;
import java.io.Serializable;
public class ByteArrayWrapper implements Serializable{
private static final long serialVersionUID = 3499562855188457886L;
private byte[] bytes;
private long sequence;
public ByteArrayWrapper(byte[] bytes, long sequenceNo) {
this.bytes = bytes;
this.sequence = sequenceNo;
}
public byte[] getBytes() {
return bytes;
}
public long getSequence() {
return sequence;
}
}
Code of FileDownloaderTask
import java.util.List;
public class FileDownloaderTask implements Runnable {
private List<ByteArrayWrapper> list;
private long start;
private long end;
private long sequenceNo;
public FileDownloaderTask(long start,long end,long sequenceNo,List<ByteArrayWrapper> list) {
this.list = list;
this.start = start;
this.end = end;
this.sequenceNo = sequenceNo;
}
@Override
public void run() {
ByteArrayWrapper wrapper = new FileSender().sendBytesOfFile(start, end, sequenceNo);
list.add(wrapper);
}
}
Questions related to this code
Does file downloading becomes fast when multiple threads is used? In this code i am not able to see the benefit.
How should i decide how many threads should i create ?
Are their any opensource libraries which does that
The file which file receiver receives is valid and not corrupted but checksum (i used FileUtils of common-io) does not match. Whats the problem?
This code gives out of memory when used with large file(above 100 Mb) i.e. because byte array which is created. How can i avoid?
I know this is a very bad code but i have to write this in one day -:). Please suggest any other good way to do this?
Upvotes: 0
Views: 1053
Reputation: 310998
Don't read huge file chunks into memory. No wonder you're running out. Just seek to the required position in the file and start copying via a sensibly sized buffer:
int count;
byte[] buffer = new byte[8192];
// or whatever takes your fancy, but sizes > the socket send buffer size are pointless
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
out.close();
in.close();
Same logic can be used at both ends - when writing the file at the receiver, use a RandomAccessFile and seek to the appropriate offset before starting this loop.
However as other respondents have noted, the client's requirement is really pretty pointless. It doesn't buy anything much except expense and risk. I would just stream the file via a single connection.
What you should do is set a large socket send and receive buffers at both ends, e.g. 60k. The default is 8k on Windows which is uselessly low.
Upvotes: 0
Reputation: 18865
1) Another reason why multiple connections may be faster is related to TCP window size.
throughput <= window size / roundtrip time
See http://en.wikipedia.org/wiki/TCP_tuning#Window_size for details.
You wont see that much difference if you run tests on a local network, because roundtrip time is small enough.
2) The only way to know for sure is to try. And the right number of threads will depend on environnment. If you need to download really big files, it might be worth it to first run a small calibration program that will try to download with different number of threads.
3) I havent looked there for a long time, but Azureus (now called Vuze) has a pretty complete API to download anything from torrent files to FTP ... And they probably have a quite efficient implementation...
Good luck !
Edit (clarification on window size) :
What you are trying to do is maximize throughput (download files faster). There is not much you can do about roundtime trip, it depends on the network. What you can do is increase window size. The window size is automagically adjusted (there is plenty of documentation on this, but I'm too lazy to google it) to best fit the current state of the network. Basically a larger window means better throughput as long as there isnt congestion or packet loss.
In the best case, you will get a window size of 64Kbits, at this point, unless you use some tricks (Jumbo frame / window scaling) which are not cupported by all routers on the internet, you get stuck at a maximum throughput of :
throughput >= 64Kbit / roundtrip time
As you cant get a bigger window, you have to open multiple windows to get around this limitation.
Notes :
Upvotes: 1
Reputation: 421060
1 Does file downloading becomes fast when multiple threads is used? In this code i am not able to see the benefit.
No. I would be very surprised if that was the case. The CPU would never have a problem of keeping up with the feeding the network-buffer.
2 How should i decide how many threads should i create ?
In my opinion, 0 extra threads.
4 The file which file receiver receives is valid and not corrupted but checksum (i used FileUtils of common-io) does not match. Whats the problem?
Make sure you don't accidentally rely on strings and specific encodings.
5 This code gives out of memory when used with large file(above 100 Mb) i.e. because byte array which is created. How can i avoid?
The obvious solution would be to read smaller chunks of the file. Have a look at the read method of DataInputStream
http://java.sun.com/j2se/1.4.2/docs/api/java/io/DataInputStream.html#read%28byte[],%20int,%20int%29
And, finally, some general pointers in the matter: Instead of using multiple threads for this kind of thing, I strongly encourage you to have a look at the java.nio package, specifically java.nio.channels and the Selector
class.
EDIT: If you're really keen on getting it super-efficient, and have very large files, you could benefit from using UDP, and handle packet order and acknowledgements yourself. TCP does for instance guarantee that the packets received come in the same order as the packets sent. This is not something that you rely heavily on (since you could easily encode the "byte-offset" for each datagram yourself) and thus don't need to "pay" for.
Upvotes: 0
Reputation: 32293
There's a bunch of questions here to answer. I'm not going to go through all of the code, but I can give you some tips.
First off, what some download accelerators do is indeed using the HTTP Range header to download parts of a file in parallel. Why does this work? TCP tries to allocate bandwidth fairly per connection. So if you're downloading a file from a server whose bandwidth is swamped, then you can receive a bigger share of the bandwidth by adding more connections. The same principle applies to servers that restrict outgoing bandwidth, which is usually also applied per connection (sometimes taking the IP into consideration).
Obviously if everybody was doing this, we'd be left with a whole lot of TCP connections and their overhead, and not a lot of bandwidth to do the actual downloading, which is why even these download accelerators will only use 2-4 connections. Moreover, if you are the one writing the server, you really don't need to worry about this, as you will only be slowing yourself down (by adding more overhead).
Going out of memory: don't use a bytearray, use a (buffered) InputStream
(or if you have some time, learn how to use java.nio
and the byte buffers) and read chunks as you are sending the file. The java tutorials cover all the basics.
Upvotes: 1