user6325753
user6325753

Reputation: 577

Ingesting CSV data into Hive using NiFi

i am trying to ingest csv data into Hive Database. for this purpose,

i tried with

listFile --> FetchFile --> ConvertCSVToAvro --> ConvertAvroToOrc --> PutHDFS

csv data is converted into ORC format and data is loading into HDFS. On top of this HDFS data, i can able to create hive external table.

now, i want to test with putHiveQL Processor.

for this, i need to convert CSV data to AVRO to JSON?

ORC data can't be loaded directly into Hive?

if yes, we have to create Hive table manually or it creates automatically?

Upvotes: 1

Views: 2664

Answers (1)

notNull
notNull

Reputation: 31490

We can create Hive table in NiFi flow itself.

ConvertAvroToOrc processor adds hive.ddl attribute to the flowfles using that attribute we can create table in Hive using PutHiveQL processor.

listFile --> FetchFile --> ConvertCSVToAvro --> ConvertAvroToOrc --> PutHDFS -->
 ReplaceText(Always replace with ${hive.ddl}) --> PutHiveQL

Refer to this i have explained in detail about the NiFi flow to create tables/partitions dynamically in hive.

  • Once ORC data is loaded into HDFS, then create table on top of the HDFS directory.
  • By using SelectHiveQL to read data from table and based on the output format(csv,avro) selected in processor results a flowfile in that format.

Upvotes: 3

Related Questions