Ron Dunn
Ron Dunn

Reputation: 3078

How to set ORC BytesColumnVector value to NULL?

I'm writing an ORC file using Groovy.

One of the columns is a String. The ORC column type is:

.addField("Name", TypeDescription.createString())

The column vector is:

BytesColumnVector vName = (BytesColumnVector) batch.cols[1]

The values to be assigned to vName may include NULLs, but I can't get ORC to write a null value into its data.

Attempting to assign a null value through set(), setValue() or setRef() throws a null pointer error, either at the point of assignment, or when the batch row is written deeper within ORC.

The closest I can get is this:

byte[] b = new byte[0]
vName.setRef (i,b,0,0)

but this puts an empty string into the data file, as shown in the following dump snippet (see the second column, 'Name'):

{"ProductID":355,"Name":"","MakeFlag":false,"StandardCost":0,"Weight":null,"ModifiedDate":"2014-02-08 10:01:36.827"}

Any thoughts on how to set a null string?

EDIT: With the answer to this question, I was able to complete some code to write the contents of a database table to ORC. It may be useful to people searching for ORC-related examples. https://www.linkedin.com/pulse/orc-adls-polybase-ron-dunn/enter link description here

Upvotes: 1

Views: 1569

Answers (1)

Omar Ali
Omar Ali

Reputation: 8617

An empty string is what I use. I don't think there's another way to do it.

Just make sure you mark the column as containing nulls.

Your code would ideally look like this:

BytesColumnVector vName = (BytesColumnVector) batch.cols[1];
byte[] EMPTY_BYTES = "".getBytes(StandardCharsets.UTF_8);
vName.setRef(i, EMPTY_BYTES, 0, 0);
vName.isNull[i] = true;
vName.noNulls = false;

Upvotes: 7

Related Questions