Reputation: 39
My intention is to query Biquery and index some fields in Elasticsearch using Go. It will be a one time batch job. Since the team has knowledge in Java we decided to benchmark both languages. I have noticed that Go is working slowly using the "iterator way".
Why this difference in time?.
Do I missing some client or query configuration in Go or that is the expected behavior?.
How can I improve this reading time?
Both Java/kotlin and Go:
(I have simplified the code)
Go 1.16.3
...
type Test struct {
TestNo *big.Rat `bigquery:"testNo,nullable"`
TestId bigquery.NullString `bigquery:"testId"`
TestTime bigquery.NullDateTime `bigquery:"testTime"`
FirstName bigquery.NullString `bigquery:"firstName"`
LastName bigquery.NullString `bigquery:"lastName"`
Items []ItemTest `bigquery:"f0_"`
}
type ItemTest struct {
ItemType bigquery.NullString `bigquery:"itemType"`
ItemNo bigquery.NullString `bigquery:"itemNo"`
ProductNo *big.Rat `bigquery:"productNo,nullable"`
Qty *big.Rat `bigquery:"qty,nullable"`
Name bigquery.NullString `bigquery:"name"`
Price *big.Rat `bigquery:"price,nullable"`
}
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
// TODO: Handle error.
}
q := client.Query(myQuery)
it, err := q.Read(ctx)
if err != nil {
// TODO: Handle error.
}
for {
start := time.Now().UTC()
var t Test
err := it.Next(&t)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
end += time.Since(start)
IndexToES(t)
}
fmt.Println(end) //13 minutes.
...
takes 13 minutes to read and map to Go structs.
Kotlin
...
val start: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)
val bigquery = BigQueryOptions.newBuilder()
.setCredentials(credentials)
.setProjectId(PROJECT_ID)
.build()
.service
val queryConfig = QueryJobConfiguration.newBuilder(query).build()
val tableResult = bigquery.query(queryConfig)
val test = results.iterateAll()
.map { myMapper.mapToTest(it) }
val end: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)
logResults(start, end) // 60000ms = 1minute
fun logResults(start: BigDecimal, end: BigDecimal){
println("query: " + (pitB - pitA).setScale(0) + "ms")
}
//iterate through test and indexing at the same time
...
Takes 1 minute...
Upvotes: 1
Views: 1239
Reputation: 4384
Neither snippet is complete, so it is unclear if this is apples to apples. If you're wondering where the time is going in the Go program, consider leveraging pprof.
The other thing to point out is that if you're reading millions of rows of query output, you're going to want to take a look at the BigQuery Storage API. Using this rather than the iterators you're currently testing against can make this faster in both languages.
Upvotes: 2