lysqlq
lysqlq

Reputation: 11

pyspark Could not load key generator class org.apache.hudi.keygen.ComplexKeyGenerator

When Pyspark is used to write data to the hudi table and the options content is as follows:

hudi_options = {   'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator', 
'hoodie.datasource.hive_sync.database': 'dingpan_test_mrs', 
'hoodie.table.name': 'dwd_design_designasset_function_offline_wide', 'hoodie.datasource.write.table.name': 'dwd_design_designasset_function_offline_wide', 'hoodie.datasource.write.recordkey.field':'id', 'hoodie.datasource.write.precombine.field': 'modified_date', 'hoodie.datasource.write.table.type':'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 
'hoodie.upsert.shuffle.parallelism': 1, 
'hoodie.insert.shuffle.parallelism': 1 
} 

new_data.write
.format("hudi")
.options(**hudi_options)
.mode("append")
.save(obs_path)

the error message:

Could not load key generator class org.apache.hudi.keygen.ComplexKeyGenerator.

How should I adjust my code?Please

Upvotes: 1

Views: 22

Answers (1)

mattyx17
mattyx17

Reputation: 816

You actually don't need ComplexKeyGenerator. That is only used when your recordKey and partition consists of multiple columns.

You just have a single column for your record key (id in this case) and no partition specified so everything will be in the Hudi default partition.

So in your case you just need SimpleKeyGenerator. But actually that is the default key generator so you can just omit that parameter altogether.

So remove this 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator', and it should work.

Some documentation on key generators: https://hudi.apache.org/docs/key_generation/#simple

Upvotes: 0

Related Questions