Reputation: 952
My company makes widgets. We make very high quality widgets, but occasionally a widget will suffer a defect known as a 'glurb'. A widget might never glurb over its entire lifetime, it may glurb once, or it may glurb multiple times. A widget's lifetime may be a few months or many years.
We maintain a database that lists every instance of a widget glurbing. For each glurb event, we know which widget glurbed, when it glurbed, and we have features about the widget before it glurbed. We know for 100% certain that when a widget glurbs, it is recorded in our database.
Management wants to build a machine learning model that, given a particular widget, will predict whether or not it will glurb in, say, the next six months.
I have a problem: I have a set of observations that show when a widget glurbs, which is the 'positive' training set, but I have no 'negative' (did not glurb) training set.
Is it statistically valid for me to choose a time, date, and widget at random, look into my database, and if I see that widget didn't glurb for 6 months after the chosen date/time, to declare that as an instance of a 'didn't glurb' event and put that in my 'negative' training set sample?
Is there a statistically valid way to generate a 'negative' test set from the data I have? If so, what would it be? If not, how could I build a classifier from the data I have?
Upvotes: 1
Views: 1330
Reputation: 1876
There has been some research on "one-class classification". Here are a couple of papers:
If your data is in the form of images, you could try using Generative Adversarial Networks (GANs) to generate negative data. There is a post on this problem here: Could I use GANs to generate negative samples for one class classification? He references Johannes' thesis.
If you program in Python check out what SciKit-Learn has to offer:
Upvotes: 0
Reputation: 7432
Yes, it is valid to do so. Depending what your management division asked, you are 100% correct. You will be predicting by definition if a widget will glurb within the next 6 months or not.
Just remember the problem that this is a different problem than predicting when a widget will glurb or if it will in its lifetime.
Upvotes: 1