Reputation: 3078
I'm writing an ORC file using Groovy.
One of the columns is a String. The ORC column type is:
.addField("Name", TypeDescription.createString())
The column vector is:
BytesColumnVector vName = (BytesColumnVector) batch.cols[1]
The values to be assigned to vName may include NULLs, but I can't get ORC to write a null value into its data.
Attempting to assign a null value through set(), setValue() or setRef() throws a null pointer error, either at the point of assignment, or when the batch row is written deeper within ORC.
The closest I can get is this:
byte[] b = new byte[0]
vName.setRef (i,b,0,0)
but this puts an empty string into the data file, as shown in the following dump snippet (see the second column, 'Name'):
{"ProductID":355,"Name":"","MakeFlag":false,"StandardCost":0,"Weight":null,"ModifiedDate":"2014-02-08 10:01:36.827"}
Any thoughts on how to set a null string?
EDIT: With the answer to this question, I was able to complete some code to write the contents of a database table to ORC. It may be useful to people searching for ORC-related examples. https://www.linkedin.com/pulse/orc-adls-polybase-ron-dunn/enter link description here
Upvotes: 1
Views: 1569
Reputation: 8617
An empty string is what I use. I don't think there's another way to do it.
Just make sure you mark the column as containing nulls.
Your code would ideally look like this:
BytesColumnVector vName = (BytesColumnVector) batch.cols[1];
byte[] EMPTY_BYTES = "".getBytes(StandardCharsets.UTF_8);
vName.setRef(i, EMPTY_BYTES, 0, 0);
vName.isNull[i] = true;
vName.noNulls = false;
Upvotes: 7