Pritish Kamath
Pritish Kamath

Reputation: 159

Multiple hive partitions with kafka-connect

I have been trying to use kafka-connect to stream data into HDFS with hive integration on, during the process.

My use case requires me to use the "FieldPartioner" as the partitioner class.

My problem is that, I am unable to get multiple partitions.

Example:

My example JSON

{
  "_id": "582d666ff6e02edad83cae28",
  "index": "ENAUT",
  "mydate": "03-01-2016",
  "hour": 120000,
  "balance": "$2,705.80"
}

I want to have partitions on the basis of 'mydate' and 'hour'

I tried the following

name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=fieldPartition_test_hdfs
hdfs.url=hdfs://quickstart.cloudera:8020
flush.size=3

partitioner.class=io.confluent.connect.hdfs.partitioner.FieldPartitioner
partition.field.name={mydate,hour}

locale=en
timezone=GMT

hive.database=weblogs
hive.integration=true
hive.metastore.uris=thrift://quickstart.cloudera:9083
schema.compatibility=BACKWARD

Also tried specifying partition.field.name as

partition.field.name={'mydate','hour'}

and

partition.field.name=mydate,hour

and many more such combinations

Any help on the issue would be greatly appreciated

Thanks.

Upvotes: 0

Views: 767

Answers (1)

Pritish Kamath
Pritish Kamath

Reputation: 159

I tried this every way possible and later started digging into the source code.

The code of FieldPartitoner is here!

And the last commit to the file here, shows "Revert 'support multi partition fields' 3 months ago"

Please do let me know if you guys have any other solution.

Upvotes: 1

Related Questions