Image classification of house build year: regression or classification?

Question

Let's say I want to find out when a house was built by training a CNN on a training set of housing images with the following mapping:

Input Pictures [244, 244, 3] -> Output Year [1850, 1850, ... , 2018]

It's a supervised learning problem so the labels are known (years from 1850-2018).

Would I built a classification or regression classifier to solve this problem? I'm unsure because I don't have inputs for every year from 1850-2018 but I want the classifier to output all values for new pictures that I give to the classifier after training is done. So this would point me to a regression classifer.

On the other hand I don't want the classifier to output continuous Y's because I'm interested in the concrete year the building was built. Not an inbetween value.

The answer to this may be super simple but I can't figure it out.

Maxim · Accepted Answer

This is clearly a regression problem. If you were to treat each year as a separate class, classes 1900 and 2017 would be equally close to 2018 (the numerical value doesn't matter in classification). But obviously two predictions - 2017 vs 1900, when the true label is 2018 - are very different. Also regression problem will allow you to generalize to unseen years, as you stated yourself. This is practically impossible in classification, if these classes aren't present in training.

If your end result must be an integer, I'd suggest you implement an interpretation of regression output. For example, it could return a round value if it's within certain bounds or two years otherwise (when the model isn't sure):

regression_output=2000.23 -> result_year=2000
regression_output=2000.96 -> result_year=2001
regression_output=2000.45 -> result_year=2000/2001

This way you'll have one more parameter to tune. E.g., having the tolerance=0.5 will make your model always sure.

Image classification of house build year: regression or classification?

Answers (1)

Related Questions