Reputation: 175
I am getting "Transport error during HTTP request" on google dataflow on using BigQueryIO.Read
.
Code:
PipelineOptionsFactory.register(UserClickOptions.class);
DataflowPipelineOptions options =
PipelineOptionsFactory.fromArgs(args).as(UserClickOptions.class);
Pipeline p = Pipeline.create(options);
......
final PCollection<TableRow> result =
p.apply(BigQueryIO.Read.named("lookup")
.fromQuery(Query.getLookupQuery()));
My query is like:
SELECT ROW_NUMBER() OVER() as Id, prop11 as X FROM (
SELECT prop11 FROM
(TABLE_DATE_RANGE([Mapping.network_v2_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'),CURRENT_TIMESTAMP()))
WHERE (REGEXP_MATCH(pagename,"^[1-9][^404]"))
AND (prop11 IS NOT null AND prop11 !="")
GROUP EACH BY prop11;
if run the query on Bigquery console it works and pipeline works for 3 months of data. Rows around 12 millions for a year.
Exception trace:
(9e64c4000646eed): Workflow failed. Causes: (d785ff42d5d40091):
S06:CPNlookup+ParDo(Cpn2Csv)+ParDo(Row2KV)
+cpn2File/FileBasedSink.ReshardForWrite/Window.Into()
+cpn2File/FileBasedSink.ReshardForWrite/RandomKey
+cpn2File/FileBasedSink.ReshardForWrite/GroupByKey/Reify
+cpn2File/FileBasedSink.ReshardForWrite/GroupByKey/Write failed.,
(eec2b6fa988abf05): BigQuery: query execution in project "cip-newsuk-data" failed.,
(eec2b6fa988abbce): BigQuery execution failed.,
(eec2b6fa988ab897): Transport error during HTTP request.
Update 12/Apr/2006 :: I am getting these failure for a day worth of data but frequency is very less compare 3 months of data.
Upvotes: 2
Views: 405
Reputation: 6130
We think we tracked this down and hope it will get fixed in the near future. In the meantime, there may be a work around that gets you unblocked for now.
The error you are seeing might be avoided if you try using TABLE_DATE_RANGE_STRICT
instead of TABLE_DATE_RANGE
. This will require that all tables expected by the date range are present, which may or may not be the case for you.
As an example, if you change your query to:
SELECT ROW_NUMBER() OVER() as Id, prop11 as X FROM (
SELECT prop11 FROM
(TABLE_DATE_RANGE_STRICT([Mapping.network_v2_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'),CURRENT_TIMESTAMP()))
WHERE (REGEXP_MATCH(pagename,"^[1-9][^404]"))
AND (prop11 IS NOT null AND prop11 !="")
GROUP EACH BY prop11;
See the documentation on table wildcard functions for more details on TABLE_DATE_RANGE_STRICT
.
Upvotes: 3