Reputation: 147
I am trying to stream data into BQ from a Scala application. Looking at the samples listed at Streaming Data Into BigQuery I see that the data needs to passed in as a Map<String, Object>
using TableDataInsertAllRequest.Rows().setJson()
.
Map<String, Object>
? If not is there any reason for this?Upvotes: 2
Views: 2220
Reputation: 2178
I also suggest you look at the BigQuery api in gcloud-java. In gcloud-java you can use a TableDataWriteChannel to stream data to a BigQuery table.
See the following example (where JSON_CONTENT
is a string of JSON):
BigQuery bigquery = BigQueryOptions.defaultInstance().service();
TableId tableId = TableId.of("dataset", "table");
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
.formatOptions(FormatOptions.json())
.build();
try (TableDataWriteChannel channel = bigquery.writer(configuration)) {
channel.write(
ByteBuffer.wrap(JSON_CONTENT.getBytes(StandardCharsets.UTF_8)));
} catch (IOException e) {
// handle exception
}
TableDataWriteChannel
uses resumable upload to stream data to the BigQuery table, which makes it more suitable for big data large files.
A TableDataWriteChannel
can also be used to stream local files:
int chunkSize = 8 * 256 * 1024;
BigQuery bigquery = BigQueryOptions.defaultInstance().service();
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
.formatOptions(FormatOptions.json())
.build();
try (FileChannel fileChannel = FileChannel.open(Paths.get("file.json"))) {
WriteChannel writeChannel = bigquery.writer(configuration);
long position = 0;
long written = fileChannel.transferTo(position, chunkSize, writeChannel);
while (written > 0) {
position += written;
written = fileChannel.transferTo(position, chunkSize, writeChannel);
}
writeChannel.close();
}
For other examples on gcloud-java-bigquery you can have a look at BigQueryExample.
Upvotes: 1
Reputation: 1061
I think you should be able to stream json content via the BigQuery api in gcloud-java by using the TableDataWriteChannel.
Which means that it should also be doable without gcloud-java (and using the api-client directly) though you may need to repeat some code that the library is doing for you.
I highly recommend looking at gcloud-java and feel free to add a feature request for also supporting json content in the instertAll operation as well.
Upvotes: 1
Reputation: 671
Unfortunately the generated libraries over our (or any Google Cloud Platform) API don't support directly writing out the request body. It's likely this aids in ensuring the validity of requests. That said, there is active work on the client library front, and a helper method seems like a reasonable request. The overhead would likely still be present (parse to client representation) for the aforementioned validation purposes, but the client interface would be a bit simpler for your scenario.
I'll pass on your request. In the mean time, this question's answer mentions a library that seems like it will ease your translation work:
Upvotes: 1
Reputation: 208042
That's the only way to stream data in. There is batch loading for large files documented here but for that you need to move the file to GCS and issue the import job from there.
Well, for that the answer is that usually the BQ connector library handles the conversion, at least that's how it's working on Java and PHP, so instead of string you need pass objects.
Upvotes: 0