Tobs40
Tobs40

Reputation: 65

Java Multithreading Network Connection Performance Advice

I need to have lots of network connections open at the same time(!) and transfer data as fast as possible. Thousands of connections. Right now, I have one thread for each connection and reading charwise from the Inputstream of that connection. And I have the strong suspicion that the CPU/switching between the thousands of threads might impose some performance problems here even though the servers are really slow (low two-digit KB/s), since I've observed that the throughput isn't even close to being proportional to the number of threads. Therefore I'd like to ask some programmers experienced in parallel programming: Is it worth rewriting the entire program so that one thread reads from multiple InputStreams in a round robin like fashion? Would that, if there is a speedup, be worth the programming? How many connections per thread? Or do you have another idea for reading really really fast from multiple network input streams?

If I don't read a char, will the server wait to send the next one until I do? What if my thread is sleeping?

Upvotes: 0

Views: 329

Answers (2)

prog-fh
prog-fh

Reputation: 16900

Thousands of threads (and stacks...) are probably too many for the OS scheduler, memory management units, caches...
You need just a few threads (one per CPU) and use a select()-based solution on each of them.
Have a look at Selector, ServerSocketChannel and SocketChannel.
(see pages 30-31 of https://www.enib.fr/~harrouet/Data/Courses/Memo_Sockets.pdf)


Edit (after a question in the comments)

Selector is not just a clever algorithm encapsulated in a class.
It relies internally on the select() system-call (or equivalent, there are many).
The operating system is aware of a set of file-descriptors (communication means) it has to watch and, as soon as something happens on one (or several) of them, it wakes up the process (or thread) which is blocked on this selector.
The idea is to stay blocked as long as possible (to save resources) and to be waken-up only on when something useful has to be done with incoming (there are variants) data.

In your current implementation, you use thousands of threads which are all blocked on a read()/recv() operation because you cannot know beforehand which connection will be the next one to deliver something.
On the other hand, with a select()-based implementation, a single thread can be blocked watching many connections at the same time but will only react to handle the few ones which just delivered new data.

So I suggest that you start a pool of few threads (one per CPU for example) and as soon as the main program accepts a new incoming connection it chooses one of them (you can keep a count for each of them) in order to make it in charge of this new connection.
All of this requires the proper synchronisation of course and probably a trick (a special file descriptor in the selector for example) in order to wake-up a blocked thread when it is assigned a new connection.

Upvotes: 1

Joni
Joni

Reputation: 111259

reading charwise

You know data is transmitted in packets right? Reading a single character at a time is very inefficient. Each read has to traverse all the layers from your program to the network stack in the operating system. You should try to read one full segment of data at a time.

If I don't read a char, will the server wait to send the next one until I do? What if my thread is sleeping?

That's why the operating system has a buffer for incoming data, also called a window. When TCP segments arrive, they are put into the receive buffer. When your program requests to read from the socket, the operating system returns data from the receive buffer. If the receive buffer is full, the packet is lost and has to be sent again.

For more about how TCP works, see https://beej.us/guide/bgnet/ Wikipedia is pretty good but fairly dense https://en.m.wikipedia.org/wiki/Transmission_Control_Protocol

Is it worth rewriting the entire program so that one thread reads from multiple InputStreams in a round robin like fashion? Would that, if there is a speedup, be worth the programming?

What you're describing would require moving from blocking I/O to non-blocking I/O. Non-blocking will require fewer system resources, but it is significantly harder to implement correctly and efficiently. So don't do it unless you have a pressing reason.

Upvotes: 2

Related Questions