swish41ffl
swish41ffl

Reputation: 147

Streaming data into Big Query

I am trying to stream data into BQ from a Scala application. Looking at the samples listed at Streaming Data Into BigQuery I see that the data needs to passed in as a Map<String, Object> using TableDataInsertAllRequest.Rows().setJson().

  1. Is this the only way to pass data in?
  2. Given that this represents data that will be streamed in as JSON by the BQ connector library is it possible to pass data in a JSONized string format instead of Map<String, Object>? If not is there any reason for this?

Upvotes: 2

Views: 2220

Answers (4)

mziccard
mziccard

Reputation: 2178

I also suggest you look at the BigQuery api in gcloud-java. In gcloud-java you can use a TableDataWriteChannel to stream data to a BigQuery table. See the following example (where JSON_CONTENT is a string of JSON):

BigQuery bigquery = BigQueryOptions.defaultInstance().service();
TableId tableId = TableId.of("dataset", "table");
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
    .formatOptions(FormatOptions.json())
    .build();
try (TableDataWriteChannel channel = bigquery.writer(configuration)) {
  channel.write(
      ByteBuffer.wrap(JSON_CONTENT.getBytes(StandardCharsets.UTF_8)));
} catch (IOException e) {
  // handle exception
}

TableDataWriteChannel uses resumable upload to stream data to the BigQuery table, which makes it more suitable for big data large files.

A TableDataWriteChannel can also be used to stream local files:

int chunkSize = 8 * 256 * 1024;
BigQuery bigquery = BigQueryOptions.defaultInstance().service();
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
    .formatOptions(FormatOptions.json())
    .build();
try (FileChannel fileChannel = FileChannel.open(Paths.get("file.json"))) {
  WriteChannel writeChannel = bigquery.writer(configuration);
  long position = 0;
  long written = fileChannel.transferTo(position, chunkSize, writeChannel);
  while (written > 0) {
    position += written;
    written = fileChannel.transferTo(position, chunkSize, writeChannel);
  }
  writeChannel.close();
}

For other examples on gcloud-java-bigquery you can have a look at BigQueryExample.

Upvotes: 1

ozarov
ozarov

Reputation: 1061

I think you should be able to stream json content via the BigQuery api in gcloud-java by using the TableDataWriteChannel.

Which means that it should also be doable without gcloud-java (and using the api-client directly) though you may need to repeat some code that the library is doing for you.

I highly recommend looking at gcloud-java and feel free to add a feature request for also supporting json content in the instertAll operation as well.

Upvotes: 1

Sean Chen
Sean Chen

Reputation: 671

Unfortunately the generated libraries over our (or any Google Cloud Platform) API don't support directly writing out the request body. It's likely this aids in ensuring the validity of requests. That said, there is active work on the client library front, and a helper method seems like a reasonable request. The overhead would likely still be present (parse to client representation) for the aforementioned validation purposes, but the client interface would be a bit simpler for your scenario.

I'll pass on your request. In the mean time, this question's answer mentions a library that seems like it will ease your translation work:

Convert Json to Map

Upvotes: 1

Pentium10
Pentium10

Reputation: 208042

  1. That's the only way to stream data in. There is batch loading for large files documented here but for that you need to move the file to GCS and issue the import job from there.

  2. Well, for that the answer is that usually the BQ connector library handles the conversion, at least that's how it's working on Java and PHP, so instead of string you need pass objects.

Upvotes: 0

Related Questions