Optical Music Recognition using Yolov3

I'm trying to write a model (Yolov3) to detect various musical symbols on a sheet of music. But all datasets suitable for this are built only on printed sheet music. Is there a way to somehow adapt the model to handwritten characters? Will pre-trainig darknet-53 help with this? If I train darknet-53 to recognize both handwritten and printed characters, what will this affect?

Yolov3 architecture: Yolov3

Upvotes: 1

Answers (1)

user20488960

Reputation: 82

I agree with the previous commenters.

You can start with converting the image to grayscale (in case handwritten notes a drawn in blue) and try a model trained on printed sheets for the recognition of 1) printed sheets and 2) handwritten sheets.

If you annotate a dataset of ~30-50 sheets, you can finetune a detector. Although it seems that you might need a very large dataset to train a high quality detector (given the variation of different music sheets), unless you want to focus on a rather controlled setting. A possible option could be to create a semi-synthetic dataset of handwritten notes by replacing each printed note by one of its handwritten images, but it might be rather hard to extend it to complex music sheets.

If you train on both printed and handwritten, it might also work.

In perspective maybe an approach like Cycle GAN or similar can help to generate handwritten examples from examples of printed and handwritten music sheets (no additional annotation required).

And for the position detection you can either

detect the note head directly (in that case you need the ground truth annotation of only the heads, not the full note symbol, and the kind of a note), which should work well with YOLO
in case you can't get annotation of heads, you'll need to determine whether the head is in the top or in the bottom (e.g. for quarter notes) and therefore you'll need to have 2 separate classes in the annotation (which you might also want in 1, if you want to keep the information about the stem direction). But in case 2) you'll need a very accurate bounding box annotation, such that you can get the head position accurately.

Start with some simple examples, where the notes have no or little intersection.

Literature could also help if you are not familiar with the topic, depending on how far you want to go.

Good luck!

Upvotes: 0

Optical Music Recognition using Yolov3

Answers (1)

Related Questions