Reputation: 11
When Pyspark is used to write data to the hudi table and the options content is as follows:
hudi_options = { 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator',
'hoodie.datasource.hive_sync.database': 'dingpan_test_mrs',
'hoodie.table.name': 'dwd_design_designasset_function_offline_wide', 'hoodie.datasource.write.table.name': 'dwd_design_designasset_function_offline_wide', 'hoodie.datasource.write.recordkey.field':'id', 'hoodie.datasource.write.precombine.field': 'modified_date', 'hoodie.datasource.write.table.type':'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert',
'hoodie.upsert.shuffle.parallelism': 1,
'hoodie.insert.shuffle.parallelism': 1
}
new_data.write
.format("hudi")
.options(**hudi_options)
.mode("append")
.save(obs_path)
the error message:
Could not load key generator class org.apache.hudi.keygen.ComplexKeyGenerator.
How should I adjust my code?Please
Upvotes: 1
Views: 22
Reputation: 816
You actually don't need ComplexKeyGenerator. That is only used when your recordKey and partition consists of multiple columns.
You just have a single column for your record key (id in this case) and no partition specified so everything will be in the Hudi default partition.
So in your case you just need SimpleKeyGenerator. But actually that is the default key generator so you can just omit that parameter altogether.
So remove this 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator',
and it should work.
Some documentation on key generators: https://hudi.apache.org/docs/key_generation/#simple
Upvotes: 0