LizardKing
LizardKing

Reputation: 666

Apache NIFI ignores serialization.null.format in SelectHiveQL

I am trying to perform a select with the SelectHiveQL processor in Apache NIFI. The output format should be 'CSV'.

The result set I am expecting has some NULL, that I would like to see in my csv reported as NULL. However, the flow file shows empty string instead of NULL. For example I have a query:

select name, surname, birth_place from persons

Let's say we don't have the birth_place for person_1, I would expect the flow file to be something like:

person_1,surname_1,NULL

whereas I get:

person_1,surname_1,

I have read that in Hive these situations are usually handled with 'serialization.null.format = 'null''. However, that does not seem to change something when using the selectHive processor in Nifi. I don't want to use any regex replacing empty with NULL since I would like to know whether the actual value is NULL or empty.

Upvotes: 0

Views: 164

Answers (1)

LizardKing
LizardKing

Reputation: 666

I have found a workaround/solution: you can get your results in Avro in you SelectHiveQL, convert them to a JSON with 'ConvertAvroToJSON'.

Avro and JSON have support for null, so NULL fields will be returned as 'null'.

Finally you need to use a ConvertRecord with a JSONTreeReader as 'Record Reader' and CSVRecordSetWriter as 'Record Writer'.

In the properties of the Record Writer you can find several options, the one I was interested in was 'Null String'. You can then set 'NULL' to have your csv displaying NULL in place of nulls field.

Upvotes: 0

Related Questions