Jeff
Jeff

Reputation: 7210

Go routine performance maximizing

I writing a data mover in go. Taking data located in one data center and moving it to another data center. Figured go would be perfect for this given the go routines.

I notice if I have one program running 1800 threads the amount of data being transmitted is really low

here's the dstat print out averaged over 30 seconds

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
0.70 3.58 4.42| 10   1  89   0   0   0|   0   156k|7306k 6667k|   0     0 |  11k 6287 
0.61 3.28 4.29| 12   2  85   0   0   1|   0  6963B|8822k 8523k|   0     0 |  14k 7531 
0.65 3.03 4.18| 12   2  86   0   0   1|   0  1775B|8660k 8514k|   0     0 |  13k 7464 
0.67 2.81 4.07| 12   2  86   0   0   1|   0  1638B|8908k 8735k|   0     0 |  13k 7435 
0.67 2.60 3.96| 12   2  86   0   0   1|   0   819B|8752k 8385k|   0     0 |  13k 7445 
0.47 2.37 3.84| 11   2  86   0   0   1|   0  2185B|8740k 8491k|   0     0 |  13k 7548 
0.61 2.22 3.74| 10   2  88   0   0   0|   0  1229B|7122k 6765k|   0     0 |  11k 6228 
0.52 2.04 3.63|  3   1  97   0   0   0|   0   546B|1999k 1365k|   0     0 |3117  2033 

If I run 9 instances of the program with 200 threads each I see much better performance

---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
8.34 9.56 8.78| 53   8  36   0   0   3|   0   410B|  38M   32M|   0     0 |  41k   26k
8.01 9.37 8.74| 74  10  12   0   0   4|   0   137B|  51M   51M|   0     0 |  59k   39k
8.36 9.31 8.74| 75   9  12   0   0   4|   0  1092B|  51M   51M|   0     0 |  59k   39k
6.93 8.89 8.62| 74  10  12   0   0   4|   0  5188B|  50M   49M|   0     0 |  59k   38k
7.09 8.73 8.58| 75   9  12   0   0   4|   0   410B|  51M   50M|   0     0 |  60k   39k
7.40 8.62 8.54| 75   9  12   0   0   4|   0   137B|  52M   49M|   0     0 |  61k   40k
7.96 8.63 8.55| 75   9  12   0   0   4|   0   956B|  51M   51M|   0     0 |  59k   39k
7.46 8.44 8.49| 75   9  12   0   0   4|   0   273B|  51M   50M|   0     0 |  58k   38k
8.08 8.51 8.51| 75   9  12   0   0   4|   0   410B|  51M   51M|   0     0 |  59k   39k

load average is a little high but I'll worry about that later. The network traffic though is almost hitting the network potential.

I'm on Ubuntu 12.04, 8 Gigs Ram, 2.3 GHz processors (says EC2 :P)

Also, I've increased my file descriptors from 1024 to 10240

I thought go was designed for this kind of thing or am I expecting too much of go for this application?

Is there something trivial that I'm missing? Do I need to configure my system to maximizes go's potential?

EDIT

I guess my question wasn't clear enough. Sorry. I'm not asking for magic from go, I know the computers have limitations to what they can handle. So I'll rephrase. Why is 1 instance with 1800 go routines != 9 instances with 200 threads each? Same amount of go routines significantly less performance for 1 instance compared to 9 instances.

Upvotes: 1

Views: 886

Answers (2)

tike
tike

Reputation: 2294

Please note, that goroutines are also limited to your local maschine and that channels are not natively network enabled, i.e. your particular case is probably not biting go's chocolate site.

Also: What did you expect from throwing (suposedly) every transfer into a goroutine? IO-Operations tend to have their bottleneck where the bits hit the metal, i.e. the physical transfer of the data to the medium. Think of it like that: No matter how many Threads or (Goroutines in this case) try to write to Networkcard, you still only have one Networkcard. Most likely hitting it with to many concurrent write calls will only slow things down, since the involved overhead increases

If you think this is not the problem or want to audit your code for optimized performance, go has neat builtin features to do so: profiling go programs (official go blog) But still the actual bottleneck might well be outside your go program AND/OR in the way it interacts with the os.

Adressing your actual problem without code is pointless guessing. Post some and everyone will try their best to help you.

Upvotes: 2

StianE
StianE

Reputation: 3175

You will probably have to post your source code to get any real input, but just to be sure, you have increased number of cpus to use?

import "runtime"

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU())
}

Upvotes: 1

Related Questions