Finetune MusicGen on Huggingface transformers: Error with decoder_start_token_id and Sinusoidal Positional offset

Question

Introduction

I'm trying to finetune a MusicGen model using the Huggingface transformers package following the code from this repo.

Two collabs were created to demonstrate the two errors I'm facing.

First error: start token id

ValueError: Make sure to set the decoder_start_token_id attribute of the model's configuration.

The model starts training, but a problem emerges in the shift_tokens_right function of the MusicgenUnconditionalInput class, where the value of the decoder_start_token_id attribute appears to be None.

Setting model.config.decoder_start_token_id = model.decoder.config.bos_token_id didn't work out, as can be seen here.

This problem can obviously be solved manually by setting decoder_start_token_id to 2048 in the shift_tokens_right function, but I don't think that's the ideal solution. After "solving" it in this unideal way, we unlock the next error.

Second error: Sinusoidal Positional offset

AttributeError: 'MusicgenSinusoidalPositionalEmbedding' object has no attribute 'offset'

Colab link

This one occurs in the forward function of the MusicgenSinusoidalPositionalEmbedding class and is another error about a value that the class just doesn't have.

It can also be solved by manually setting the value. Although I'm not a 100% sure, an offset value of 1 seems to work, probably due to the codebook pattern.

After "solving"

After manually setting those values in the package code, it will work.

When training it keeps printing

It is strongly recommended to pass the sampling_rate argument to this function. Failing to do so can result in silent errors that might be hard to debug

It would also be nice to know how to stop this.

Finetune MusicGen on Huggingface transformers: Error with decoder_start_token_id and Sinusoidal Positional offset

Introduction

First error: start token id

Second error: Sinusoidal Positional offset

After "solving"

Answers (0)

Related Questions