Datageek
Datageek

Reputation: 26679

Debugging Dataflow template GCS to BigQuery

I am getting some strange errors that are difficult to debug. I am running a simple UDF JavaScript mapper which maps the JSON data and imports it into BigQuery. I've run other UDF functions previously and never encountered such errors.

Is there any way to debug (with the actual debugger or at least with console.log or similar) the Dataflow templates UDF errors?

The error in question: exception: "java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:183) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:101) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114) ...

It's very difficult to say what this error is about: is this input data that is mis-formatted or output JSON from the UDF?

I've tried everything so far:

Any tips on debugging Dataflow UDF Javascript would be highly appreciated.

Is the source code of these Java classes available anywhere online?

Upvotes: 2

Views: 1429

Answers (2)

Ricardo Mendes
Ricardo Mendes

Reputation: 437

Be careful when copying the sample schema to the Text Files on Cloud Storage to BigQuery template from Cloud Console: it brings BigQuery Schema, with two blank spaces between the words.

enter image description here

I received a java.lang.RuntimeException: org.json.JSONException: JSONObject["BigQuery Schema"] not found multiple times before figuring out what was going on, using a schema based on such a Cloud Console sample...

Upvotes: 0

Datageek
Datageek

Reputation: 26679

In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:

{
  "BigQuery Schema": [
    ... schema goes here
  ]
}

The following code could be useful for debugging: TextIOToBigQuery.java

See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates

Upvotes: 4

Related Questions