Reputation: 159
I have been trying to use kafka-connect to stream data into HDFS with hive integration on, during the process.
My use case requires me to use the "FieldPartioner" as the partitioner class.
My problem is that, I am unable to get multiple partitions.
Example:
My example JSON
{
"_id": "582d666ff6e02edad83cae28",
"index": "ENAUT",
"mydate": "03-01-2016",
"hour": 120000,
"balance": "$2,705.80"
}
I want to have partitions on the basis of 'mydate' and 'hour'
I tried the following
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=fieldPartition_test_hdfs
hdfs.url=hdfs://quickstart.cloudera:8020
flush.size=3
partitioner.class=io.confluent.connect.hdfs.partitioner.FieldPartitioner
partition.field.name={mydate,hour}
locale=en
timezone=GMT
hive.database=weblogs
hive.integration=true
hive.metastore.uris=thrift://quickstart.cloudera:9083
schema.compatibility=BACKWARD
Also tried specifying partition.field.name as
partition.field.name={'mydate','hour'}
and
partition.field.name=mydate,hour
and many more such combinations
Any help on the issue would be greatly appreciated
Thanks.
Upvotes: 0
Views: 767
Reputation: 159
I tried this every way possible and later started digging into the source code.
The code of FieldPartitoner is here!
And the last commit to the file here, shows "Revert 'support multi partition fields' 3 months ago"
Please do let me know if you guys have any other solution.
Upvotes: 1