Reputation: 115
What is the best way to export a PCollection<TableRow>
, either as a BigQuery table or as a .csv file, when both the .csv header and the table schema are not defined?
They are unknown since the PCollection<TableRow>
is a result of BigQueryIO.Read
query (does not return schema), however, the column names of resulting table rows could be parsed from the string that was used in the query (workaround).
Example:
String query = "SELECT nationality, COUNT(DISTINCT personID) AS population
FROM Dataset.Table
GROUP BY nationality";
PCollection<TableRow> result = p.apply(BigQueryIO.Read.fromQuery(query));
What I would like to do is make a function that automatically exports a .csv or a table, without manually defining the schema or .csv header for every query result.
Any suggestions? Thanks in advance!
Upvotes: 0
Views: 747
Reputation: 2247
Let me add to the existing accepted answer to the other question:
Alternatively, you could make a separate query to BigQuery directly via jobs: query at pipeline construction time, whose result can then be passed to BigQueryIO.Write transform.
There should be little or no cost for the query to determine the schema. You simply need to set the dryRun
flag in your query and then there will be no bytes processed.
Upvotes: 1