Nevo
Nevo

Reputation: 952

Building a classifier with only examples for one class

My company makes widgets. We make very high quality widgets, but occasionally a widget will suffer a defect known as a 'glurb'. A widget might never glurb over its entire lifetime, it may glurb once, or it may glurb multiple times. A widget's lifetime may be a few months or many years.

We maintain a database that lists every instance of a widget glurbing. For each glurb event, we know which widget glurbed, when it glurbed, and we have features about the widget before it glurbed. We know for 100% certain that when a widget glurbs, it is recorded in our database.

Management wants to build a machine learning model that, given a particular widget, will predict whether or not it will glurb in, say, the next six months.

I have a problem: I have a set of observations that show when a widget glurbs, which is the 'positive' training set, but I have no 'negative' (did not glurb) training set.

Is it statistically valid for me to choose a time, date, and widget at random, look into my database, and if I see that widget didn't glurb for 6 months after the chosen date/time, to declare that as an instance of a 'didn't glurb' event and put that in my 'negative' training set sample?

Is there a statistically valid way to generate a 'negative' test set from the data I have? If so, what would it be? If not, how could I build a classifier from the data I have?

Upvotes: 1

Views: 1330

Answers (2)

Brian O'Donnell
Brian O'Donnell

Reputation: 1876

There has been some research on "one-class classification". Here are a couple of papers:

If your data is in the form of images, you could try using Generative Adversarial Networks (GANs) to generate negative data. There is a post on this problem here: Could I use GANs to generate negative samples for one class classification? He references Johannes' thesis.

If you program in Python check out what SciKit-Learn has to offer:

Upvotes: 0

Djib2011
Djib2011

Reputation: 7432

Yes, it is valid to do so. Depending what your management division asked, you are 100% correct. You will be predicting by definition if a widget will glurb within the next 6 months or not.

Just remember the problem that this is a different problem than predicting when a widget will glurb or if it will in its lifetime.

Upvotes: 1

Related Questions