Reputation: 26679
I am getting some strange errors that are difficult to debug. I am running a simple UDF JavaScript mapper which maps the JSON data and imports it into BigQuery. I've run other UDF functions previously and never encountered such errors.
Is there any way to debug (with the actual debugger or at least with console.log or similar) the Dataflow templates UDF errors?
The error in question:
exception: "java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]
at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:183)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:101)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114)
...
It's very difficult to say what this error is about: is this input data that is mis-formatted or output JSON from the UDF?
I've tried everything so far:
{}
)Any tips on debugging Dataflow UDF Javascript would be highly appreciated.
Is the source code of these Java classes available anywhere online?
Upvotes: 2
Views: 1429
Reputation: 437
Be careful when copying the sample schema to the Text Files on Cloud Storage to BigQuery template from Cloud Console: it brings BigQuery Schema
, with two blank spaces between the words.
I received a java.lang.RuntimeException: org.json.JSONException: JSONObject["BigQuery Schema"] not found
multiple times before figuring out what was going on, using a schema based on such a Cloud Console sample...
Upvotes: 0
Reputation: 26679
In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:
{
"BigQuery Schema": [
... schema goes here
]
}
The following code could be useful for debugging: TextIOToBigQuery.java
See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates
Upvotes: 4