Reputation: 681
I have a data stream containing Strings
which look like JSONArrays
. I want to parse those Strings and write to BigQuery table using Apache Beam but am getting an error while writing repeated Strings.
Here´s how I convert my string to TableRow
:
String dataString = "[{\"EMAIL\": [\"[email protected]\"]}]";
JSONArray jsonArray = new JSONArray(dataString);
TableRow tableRow = new TableRow();
for (int i = 0; i < jsonArray.length(); i++) {
JSONArray emailArray = new JSONArray(jsonArray.getJSONObject(i).get("EMAIL").toString());
tableRow.set("EMAIL", emailArray); //Results in error
}
Here´s what my BigQuery schema looks like:
[
{
"name": "EMAIL",
"type": "STRING",
"mode": "REPEATED"
}
]
I have managed to write a similar repeated String to BigQuery table using Python but unable to do it using Apache Beam. I suppose I am not saving the right key-value pair in TableRow
. The error I am getting now is:
java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"email","message":"This field is not a record.","reason":"invalid"}],"index":0}]
I need help regarding how to save a similar repeated String to BigQuery without creating a record and would appreciate any advice or suggestions. Thanks in advance.
Upvotes: 3
Views: 865
Reputation: 1499
It seems you want to create
Note that is seems your ValidFrom
field is of type STRING
, not a repeated field, unless it is wrapped in a repeated field in a hierarchical schema.
In the example code you provided, you are creating a JSONArray
and putting it into the STRING
field, which I think cause issues as the types are incompatible. If you want to keep it as a plain STRING
field, you can use Solution 1 below.
Also make sure that the name of your column in BigQuery matches the one in your code, I see you use both ValidFrom
and EMAIL
(might be a mistake in your posted code though).
In case you want to add one row with a concatenated String
field in BigQuery, you can use the following:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build a concatenated string of emails
String allEmails = String.join(";", emails);
// Add the string field to the row
tableRow.set('EMAILS', allEmails);
In case you want to insert multiple rows, you your create multiple table rows:
// Find email addresses
String [] emails = ... // your extraction logic
// Build a row per email
for(String email: emails) {
// Initialize your final row
TableRow tableRow = new TableRow();
tableRow.set('EMAIL', email);
// TODO: do something with the row (add to list, or ...)
}
In case you want to add one row with a REPEATED STRING
field in BigQuery, you can use the following:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build the repeated field
List<String> emailCells = new ArrayList<>();
for(String email: emails) {
emailCells.add(email);
}
// Add the repeated field to the row
tableRow.set('EMAILS', emailCells);
If this is not what you're aiming for, please provide some more details.
Upvotes: 2