Multiple hive partitions with kafka-connect

Question

I have been trying to use kafka-connect to stream data into HDFS with hive integration on, during the process.

My use case requires me to use the "FieldPartioner" as the partitioner class.

My problem is that, I am unable to get multiple partitions.

Example:

My example JSON

{
  "_id": "582d666ff6e02edad83cae28",
  "index": "ENAUT",
  "mydate": "03-01-2016",
  "hour": 120000,
  "balance": "$2,705.80"
}

I want to have partitions on the basis of 'mydate' and 'hour'

I tried the following

name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=fieldPartition_test_hdfs
hdfs.url=hdfs://quickstart.cloudera:8020
flush.size=3

partitioner.class=io.confluent.connect.hdfs.partitioner.FieldPartitioner
partition.field.name={mydate,hour}

locale=en
timezone=GMT

hive.database=weblogs
hive.integration=true
hive.metastore.uris=thrift://quickstart.cloudera:9083
schema.compatibility=BACKWARD

Also tried specifying partition.field.name as

partition.field.name={'mydate','hour'}

and

partition.field.name=mydate,hour

and many more such combinations

Any help on the issue would be greatly appreciated

Thanks.

Multiple hive partitions with kafka-connect

Answers (1)

Related Questions