Feature representation for multivariate sequence learning

Question

I have been trying to train a model to generate sequences from monophonic musical scores. Around the internet I have found some examples of people doing this with character-level lstm networks and music ABC notation (A lot use karpathy's implementation in Torch: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ).

While this seems reasonably effective, the sequences do not contain the musical information as compact as it could be. My thought was to directly represent the musical information as a sequence of notes. However, a note has multiple features: pitch, octave, duration, is it connected to the next note, etc. I am not sure how to properly represent this information as a feature vector, and have not found a lot of information on this subject.

My dataset has a quite limited diversity in note pitches and lengths. It maybe contains 3 octaves, 10 different note durations, only 4/4 time signature. However, representing each different combination of these notes would get a huge input feature vector.

Any advice would be greatly appreciated!

Feature representation for multivariate sequence learning

Answers (1)

Related Questions