Idemas
Idemas

Reputation: 39

Go Iterator reading 1 million rows from Bigquery 10x slower than Java or kotlin?

My intention is to query Biquery and index some fields in Elasticsearch using Go. It will be a one time batch job. Since the team has knowledge in Java we decided to benchmark both languages. I have noticed that Go is working slowly using the "iterator way".

Why this difference in time?.

Do I missing some client or query configuration in Go or that is the expected behavior?.

How can I improve this reading time?

Both Java/kotlin and Go:

(I have simplified the code)

Go 1.16.3

...

type Test struct {
    TestNo    *big.Rat              `bigquery:"testNo,nullable"`
    TestId    bigquery.NullString   `bigquery:"testId"`
    TestTime  bigquery.NullDateTime `bigquery:"testTime"`
    FirstName bigquery.NullString   `bigquery:"firstName"`
    LastName  bigquery.NullString   `bigquery:"lastName"`
    Items     []ItemTest            `bigquery:"f0_"`
}

type ItemTest struct {
    ItemType  bigquery.NullString `bigquery:"itemType"`
    ItemNo    bigquery.NullString `bigquery:"itemNo"`
    ProductNo *big.Rat            `bigquery:"productNo,nullable"`
    Qty       *big.Rat            `bigquery:"qty,nullable"`
    Name      bigquery.NullString `bigquery:"name"`
    Price     *big.Rat            `bigquery:"price,nullable"`
}


ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
    // TODO: Handle error.
}


q := client.Query(myQuery)

it, err := q.Read(ctx)
if err != nil {
    // TODO: Handle error.
}


for {
    start := time.Now().UTC()

    var t Test
    err := it.Next(&t)
    if err == iterator.Done {
        break
    }
    if err != nil {
        // TODO: Handle error.
    }

    end += time.Since(start)

    IndexToES(t)
   
}

fmt.Println(end) //13 minutes.

...

takes 13 minutes to read and map to Go structs.

Kotlin

...

val start: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)

val bigquery = BigQueryOptions.newBuilder()
            .setCredentials(credentials)
            .setProjectId(PROJECT_ID)
            .build()
            .service

val queryConfig = QueryJobConfiguration.newBuilder(query).build()

val tableResult = bigquery.query(queryConfig)

val test = results.iterateAll()
            .map { myMapper.mapToTest(it) }

val end: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)


logResults(start, end) // 60000ms = 1minute 

fun logResults(start: BigDecimal, end: BigDecimal){
       println("query: " + (pitB - pitA).setScale(0) + "ms") 
}

//iterate through test and indexing at the same time
...

Takes 1 minute...

Upvotes: 1

Views: 1239

Answers (1)

shollyman
shollyman

Reputation: 4384

Neither snippet is complete, so it is unclear if this is apples to apples. If you're wondering where the time is going in the Go program, consider leveraging pprof.

The other thing to point out is that if you're reading millions of rows of query output, you're going to want to take a look at the BigQuery Storage API. Using this rather than the iterators you're currently testing against can make this faster in both languages.

Upvotes: 2

Related Questions