Reputation: 1
I have been trying to train a model to generate sequences from monophonic musical scores. Around the internet I have found some examples of people doing this with character-level lstm networks and music ABC notation (A lot use karpathy's implementation in Torch: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ).
While this seems reasonably effective, the sequences do not contain the musical information as compact as it could be. My thought was to directly represent the musical information as a sequence of notes. However, a note has multiple features: pitch, octave, duration, is it connected to the next note, etc. I am not sure how to properly represent this information as a feature vector, and have not found a lot of information on this subject.
My dataset has a quite limited diversity in note pitches and lengths. It maybe contains 3 octaves, 10 different note durations, only 4/4 time signature. However, representing each different combination of these notes would get a huge input feature vector.
Any advice would be greatly appreciated!
Upvotes: 0
Views: 131
Reputation: 2827
As long as you can encode and decode your training examples in a text format I think you can adapt and use the character-level lstm approach.
For example you could represent every note by a letter (ABCDEFG) use + - for sharp or flat, follow by a code like x, y, z for the octave, a number 0 to 9 for the duration value and a space for whether it is connected or not.
Like this blues riff:
Cx1 E-x1 Cx1 E-x1 Fx4 Gx2 B-x2 Cy1Cy1Cy1Cy1, etc.
With enough training data the RNN would learn this syntax and grammar of the notes as well as patterns and relationships between the notes that lead to "musical" sequences...depending on what kind of music you use to train it.
Upvotes: 1