Reputation: 149
So I've got an interesting problem that seems counterintuitive to me. I am building a tool where the biggest bottleneck is the rate at which I can send packets. Currently I can handle over a million requests in less than 30 seconds which is great but I'm trying to squeeze out as much speed as possible. My idea was to attach a second ethernet adapter to the machine and spin up two different net.Dialer's like so
net.Dialer{
Timeout: time.Duration(*timeoutPtr) * time.Second,
LocalAddr: addr,
}
where addr is one of the two ethernet adapters. Then I assign the dialers to a job round robin style like so:
for i, target := range targets {
dialer = dialers[i%len(dialers)]
....
go someNetworkFunction(dialer)
}
What's surprising to me is that when I run it with 2 adapters it executes much much slower, 30 seconds vs 2 minutes! I'm just trying to understand why giving the code two connections to send packets slows down the code instead of speeding it up. It doesn't appear that the modulus operation there should cause a 300% slowdown. Is there something happening at the kernel layer when trying to use both adapters to send at the same time? Any help would be appreciated.
Upvotes: 3
Views: 79
Reputation: 6084
There can be multiple factors in play:
If you run a profile of your app, which with mln request in 30seconds, is spending not too much application time, you will probably see that syscall
is using more of your time. syscall
represents the (out of view) cpu time spend out of view of your application.
If this syscall
time increases non-linear compared to the process you are benchmarking, you have a bottleneck outside of your program.
go routines are scheduled against CPU cores (on the physical level). While they are easy create, the actual switching between go routines is not overhead free. The implementation of someNetworkFunction
can make a difference in the throughput where you can block resources, or just switch too often. You can try to manage this by instructing the go program to use less threads with GOMAXPROCS
. By tweaking this value, you can determine for your program and hardware what an optimal value is.
A more in depth explanation of the scheduler can be found at https://rakyll.org/scheduler/
Upvotes: 2