Reputation: 43
I am diving into Golang and have a problem that I have been working on a few days and I just cant seem to grasp the concept of go routines and how they are used.
Basically I am, trying to generate millions of random records. I have functions that make the random data, and will create a giant .CSV file containing this data.
My questions is if it is possible to make this concurrent and speed things up?
My code is basically generate a random string, write string to file up to N times (where N is whatever you want).
My question is if its possible to even do this concurrently in order to reduce execution time. It seems that no matter how I approach this problem, I still get the same benchmark as if I did it without go routines.
This is a sample of what I have so far:
func worker(c chan string) {
for {
c <- /* Generate random data using other functions here */
}
close(c)
}
func writer(s string) {
csvfile.WriteString(s)
}
func main(){
receive := make(chan string)
for i := 0; i < 100; i++ {
go worker(receive)
}
for i := 0; i < 10000; i++ {
go writer(<-receive)
}
}
Where I generate data, I am using tons and tons of function calls from: https://github.com/Pallinder/go-randomdata. Do you think that that could be where I am losing all this time?
Any help would be appreciated.
Upvotes: 2
Views: 999
Reputation: 16420
receive := make(chan string, 1000)
Write speed is limited by your disk, so there's only so much you can do to help by writing concurrently, and from what you're telling generating data concurrently doesn't help either.
Concurrency isn't the solution for anything slow, either accept you're at the limit or optimize.
Upvotes: 0
Reputation: 48076
I don't think you should be trying to use a go routine here. File writes are almost always atomic, you want to make the mechanism which writes to your file concurrent... That would require a complicated locking mechanism that ultimately probably won't improve application performance due to the write itself still being atomic.
If data generation were bottle necking your program then it would make sense to split that work off in go routines and write from that place where you get all the data. But
for i := 0; i < 100; i++ {
go worker(receive)
}
for {
select {
case item := <-receive:
writer(item)
case <-abort:
cleanUp()
return
}
}
You can't just loop on some int while recieving from a channel and calling a function endlessly... You can receive from a channel in a select though. Or just by doing item := <-recieve
which would block until one item gets read. In my example above I've provided some pseudo code to demonstrate more what your design should be in this case. You need an abort channel so you can get out of your go routines should you want to stop the application. It should probably finalize the write to your file and then close it before returning.
Upvotes: 1