Reputation: 11134
I am trying to predict bounding boxes on a custom dataset using transfer learning on yolov7 pretrained model.
My dataset contains 34 scenes for training, 2 validation scenes and 5 test scenes. Nothing much happens on the scene, just the camera moves 60-70 degree around the objects on a table/flat surface and scales/tilts a bit. So, even though I have around 20k training images (extracted from 34 scenes), from each scene, the images I get are almost the same, with a kind of augmentation effect (scaling, rotation, occlusion and tilting coming from camera movement).
Here is an example of a scene (first frame and last frame)
Now, I tried different things.
Some of them kind of works (correctly predicts the bounding boxes with 100% precision, but lower recall, and sometimes with wrong class label), but the biggest problem I am facing is, for validation, the object loss is never going down (No matter which approach I try). It happens even from the start, so not sure if I am overfitting or not.
The below graph is from transfer learning in tiny model with frozen backbone.
Any suggestions of how to solve the problem and get a better result?
Upvotes: 0
Views: 2320
Reputation: 357
Balance dataset copying images containing the class which appears the less in the dataset.
Implement on YOLOv7 : copy this function into "yolov7/utils/datasets.py" and change the line : "sampler = torch.utils.data.distributed.DistributedSampler(dataset) if rank != -1 else None" in create dataloader function with : "sampler = get_weighted_samples(labels= dataset.labels, upsampled_class= 1)" Here 1 is the label of the class which is not represented so much.
def get_weighted_samples(labels: np.array, upsampled_class: int = 1) -> List:
filtered_dataset = list(filter(lambda item: (item[:, 0] == upsampled_class).any(), labels))
percent = len(filtered_dataset) / len(labels)
weights = [percent if (item[:, 0] == upsampled_class).any() else 1-percent for item in labels]
weights = np.array(weights)
sampler=WeightedRandomSampler(torch.from_numpy(weights),len(weights))
return sampler
Upvotes: 0
Reputation: 4171
I would suggest you thoroughly review your dataset, to start.
Check the class distributions.
Check your dataset's annotations. Are your objects labelled correctly? If you have time, take a random 1000 images and plot the bounding boxes on them and manually check the labels. This is a sort of sanity check, and sometimes you can find wrongly drawn boxes and incorrect labels.
Another issue could be the lack of variety, as you have mentioned. You have 20K images in your training set, but possibly there are at most just ~34 unique mugs inside (assuming mug is a class). Maybe all those mugs are white, blue, or brown in color, but in your validation the mug is a bright red. (I hope you get the idea).
Try playing around with the hyperparameters a bit. Explore a slightly lower or slightly longer learning rate, longer warmup, stronger weight decay. I assume these are the settings you are using; try increasing the mosaic, copy paste, flip up etc. probabilities as well. If stronger augmentation params are having positive results, it could be a hint that the problem is that the dataset is redundant and lacks variety.
Upvotes: 2