Reputation: 713
I'm calling Pub/Sub via a REST request. I'm trying to put columnised data on a topic on Pub/Sub, which then goes into DataFlow, and finally into Big Query where a Table has been defined.
This is the layout of said JSON Data:
"age": "58",
"job": "management",
"marital": "married",
"education": "tertiary",
"default": "no",
"balance": "2143",
"housing": "yes",
"loan": "no",
"contact": "unknown",
"day": "5",
"month": "may",
"duration": "261",
"campaign": "1",
"pdays": "-1",
"previous": "0",
"poutcome": "unknown",
"y": "no"
Now, to formate the correct JSON body this needs to go into the following request for Pub/Sub to recognise:
"messages": [{
"attributes": {
"key": "",
"value": "en"
"data": "%DATA%"
Now, Pub/Sub REST reference states that the "Data" field needs to be converted into Base64, so that is what I do, and the final JSON Format is as follows ( %DATA% is replaced with the Base64 conversion of the original message data)
"messages": [{
"attributes": {
"key": "",
"value": "en"
"data": "Ww0KICB7DQogICAgImFnZSI6ICI1OCIsDQogICAgImpvYiI6ICJtYW5hZ2VtZW50IiwNCiAgICAibWFyaXRhbCI6ICJtYXJyaWVkIiwNCiAgICAiZWR1Y2F0aW9uIjogInRlcnRpYXJ5IiwNCiAgICAiZGVmYXVsdCI6ICJubyIsDQogICAgImJhbGFuY2UiOiAiMjE0MyIsDQogICAgImhvdXNpbmciOiAieWVzIiwNCiAgICAibG9hbiI6ICJubyIsDQogICAgImNvbnRhY3QiOiAidW5rbm93biIsDQogICAgImRheSI6ICI1IiwNCiAgICAibW9udGgiOiAibWF5IiwNCiAgICAiZHVyYXRpb24iOiAiMjYxIiwNCiAgICAiY2FtcGFpZ24iOiAiMSIsDQogICAgInBkYXlzIjogIi0xIiwNCiAgICAicHJldmlvdXMiOiAiMCIsDQogICAgInBvdXRjb21lIjogInVua25vd24iLA0KICAgICJ5IjogIm5vIg0KICAgIH0NCl0="
Pub/Sub allows this data and then puts it into DataFlow, but this is where everything breaks. DataFlow tries to deserialize the information, but that fails with the following message:
(efdf538fc01f50b0): java.lang.RuntimeException: Unable to parse input$JsonToTableRow$1.apply($JsonToTableRow$1.apply(
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of out of START_ARRAY token
at [Source: [{"age":"32","job":"\"admin.\"","marital":"\"single\"","education":"\"secondary\"","default":"\"no\"","balance":"5","housing":"\"yes\"","loan":"\"no\"","contact":"\"unknown\"","day":"12","month":"\"may\"","duration":"593","campaign":"2","pdays":"-1","previous":"0","poutcome":"\"unknown\"","y":"\"no\""}]; line: 1, column: 1]
I think it is something to do with how the "data":
field is being formatted, but I've tried other methods and I just can't get anything to work.
Upvotes: 0
Views: 1487
Reputation: 119
Try serializing your JSON data via ProtoBuf, de-serialize the data after reading in beam pipeline (assuming you are using Apache Beam), and before writing it to BigQuery, encode it into a byte-string.
Upvotes: 0
Reputation: 713
After further experimentation, the issue was indeed how the JSON was formatted. By removing the opening [
and closing ]
DataFlow was indeed able to recognise the data and then put it into BigQuery.
Upvotes: 5