Reputation: 23
I'm trying to finetune a MusicGen model using the Huggingface transformers package following the code from this repo.
Two collabs were created to demonstrate the two errors I'm facing.
ValueError: Make sure to set the decoder_start_token_id attribute of the model's configuration.
The model starts training, but a problem emerges in the shift_tokens_right
function of the MusicgenUnconditionalInput
class, where the value of the decoder_start_token_id
attribute appears to be None
.
Setting model.config.decoder_start_token_id = model.decoder.config.bos_token_id
didn't work out, as can be seen here.
This problem can obviously be solved manually by setting decoder_start_token_id
to 2048 in the shift_tokens_right
function, but I don't think that's the ideal solution. After "solving" it in this unideal way, we unlock the next error.
AttributeError: 'MusicgenSinusoidalPositionalEmbedding' object has no attribute 'offset'
This one occurs in the forward
function of the MusicgenSinusoidalPositionalEmbedding
class and is another error about a value that the class just doesn't have.
It can also be solved by manually setting the value. Although I'm not a 100% sure, an offset value of 1 seems to work, probably due to the codebook pattern.
After manually setting those values in the package code, it will work.
When training it keeps printing
It is strongly recommended to pass the sampling_rate argument to this function. Failing to do so can result in silent errors that might be hard to debug
It would also be nice to know how to stop this.
Upvotes: 0
Views: 28