I am trying to predict bounding boxes on a custom dataset using transfer learning on yolov7 pretrained model. My dataset contains 34 scenes for training, 2 validation scenes and 5 test scenes. Nothing much happens on the scene, just the camera moves 60-70 degree around the objects on a table/flat surface and scales/tilts a bit. So, even though I have around 20k training images (extracted from 34 scenes), from each scene, the images I get are almost the same, with a kind of augmentation effect (scaling, rotation, occlusion and tilting coming from camera movement). Here is an example of a scene (first frame and last frame) Now, I tried different things. transfer learning with Pretrained yolov7 p5 model transfer learning with Pretrained yolov7 p5 model (with freezing the extractor, 50 layers) transfer learning with Pretrained yolov7 tiny model transfer learning with Pretrained yolov7 tiny model (with freezing the extractor, 28 layers) full training yolov7 p5 network full training yolov7 tiny network. Some of them kind of works (correctly predicts the bounding boxes with 100% precision, but lower recall, and sometimes with wrong class label), but the biggest problem I am facing is, for validation, the object loss is never going down (No matter which approach I try). It happens even from the start, so not sure if I am overfitting or not. The below graph is from transfer learning in tiny model with frozen backbone. Any suggestions of how to solve the problem and get a better result?

pythonmachine-learningdeep-learningpytorchcomputer-vision

Reputation: 11134

Object Detection with YOLOV7 on custom dataset

I am trying to predict bounding boxes on a custom dataset using transfer learning on yolov7 pretrained model.

My dataset contains 34 scenes for training, 2 validation scenes and 5 test scenes. Nothing much happens on the scene, just the camera moves 60-70 degree around the objects on a table/flat surface and scales/tilts a bit. So, even though I have around 20k training images (extracted from 34 scenes), from each scene, the images I get are almost the same, with a kind of augmentation effect (scaling, rotation, occlusion and tilting coming from camera movement).

Here is an example of a scene (first frame and last frame)

Now, I tried different things.

transfer learning with Pretrained yolov7 p5 model
transfer learning with Pretrained yolov7 p5 model (with freezing the extractor, 50 layers)
transfer learning with Pretrained yolov7 tiny model
transfer learning with Pretrained yolov7 tiny model (with freezing the extractor, 28 layers)
full training yolov7 p5 network
full training yolov7 tiny network.

Some of them kind of works (correctly predicts the bounding boxes with 100% precision, but lower recall, and sometimes with wrong class label), but the biggest problem I am facing is, for validation, the object loss is never going down (No matter which approach I try). It happens even from the start, so not sure if I am overfitting or not.

The below graph is from transfer learning in tiny model with frozen backbone.

Any suggestions of how to solve the problem and get a better result?

Upvotes: 0

Answers (2)

tCot

Reputation: 357

Balance dataset copying images containing the class which appears the less in the dataset.

Implement on YOLOv7 : copy this function into "yolov7/utils/datasets.py" and change the line : "sampler = torch.utils.data.distributed.DistributedSampler(dataset) if rank != -1 else None" in create dataloader function with : "sampler = get_weighted_samples(labels= dataset.labels, upsampled_class= 1)" Here 1 is the label of the class which is not represented so much.

def get_weighted_samples(labels: np.array, upsampled_class: int = 1) -> List:
   filtered_dataset = list(filter(lambda item: (item[:, 0] == upsampled_class).any(), labels))
   percent = len(filtered_dataset) / len(labels)

   weights = [percent if (item[:, 0] == upsampled_class).any() else 1-percent for item in labels]
   weights = np.array(weights)

   sampler=WeightedRandomSampler(torch.from_numpy(weights),len(weights))
   return sampler

Upvotes: 0

Mercury

Reputation: 4171

I would suggest you thoroughly review your dataset, to start.

Check the class distributions.
- How many classes do you have, and what are the counts of the objects of these classes in the training set?
- What are the counts in the validation set? Are the ratios approximately similar or different?
- Is any class lacking examples (i.e. is too few by proportion)?
- Do you have enough background samples? (Images where no desired object is present)
Check your dataset's annotations. Are your objects labelled correctly? If you have time, take a random 1000 images and plot the bounding boxes on them and manually check the labels. This is a sort of sanity check, and sometimes you can find wrongly drawn boxes and incorrect labels.
Another issue could be the lack of variety, as you have mentioned. You have 20K images in your training set, but possibly there are at most just ~34 unique mugs inside (assuming mug is a class). Maybe all those mugs are white, blue, or brown in color, but in your validation the mug is a bright red. (I hope you get the idea).
Try playing around with the hyperparameters a bit. Explore a slightly lower or slightly longer learning rate, longer warmup, stronger weight decay. I assume these are the settings you are using; try increasing the mosaic, copy paste, flip up etc. probabilities as well. If stronger augmentation params are having positive results, it could be a hint that the problem is that the dataset is redundant and lacks variety.

Upvotes: 2

Object Detection with YOLOV7 on custom dataset

Answers (2)

Related Questions