Reputation: 27275
First of all this is tough thing to solve, so far I didn't come up with a good example but I hope someone here will figure this out. I hope there is known way to solve these kind of problems, or an obscure algorithm.
Scenario:
Challenge
What I've tried
Limitations and weaknesses of this algorithm is pretty obvious. Although I've got some good results in some cases, but it doesn't work as expected all the time.
My current class works like this:
Dim Analyser AS NEW ContentAnalyzer()
Analyser.AddTrueCase(True1Html)
Analyser.AddTrueCase(True2Html)
Analyser.AddTrueCase(True3Html)
'This will return True if the UnknownHtml is similar to TRUE case, otherwise False
Analyser.IsThisTrue(UnknownHtml)
Sorry the title doesn't make much sense, I couldn't find a good way to describe it.
Upvotes: 2
Views: 158
Reputation: 51511
Perhaps you mean something like Bayesian Filtering? You could look at what Paul Graham has done with Spam: http://www.paulgraham.com/better.html
Upvotes: 0
Reputation: 12492
It sounds like you're doing fairly simple document classification. This is a heavily researched field, especially lately due to spam filters. Look into a library for document classification in your language of choice.
Classifier4j looks like a popular library that runs on the Java VM and has been ported to .NET.
Upvotes: 2
Reputation: 11910
Either this is really misstated or I'm just not getting something:
The application requests the web page and gets it and has to ascertain if it is another "True" or "False", right? This is to say that part of the web request isn't to return the true or false at the beginning which is where my first confusion is.
Secondly, why aren't you doing a similar comparison on the false cases and seeing if there are sufficient similarities to create 3 buckets of results for some random page requested:
1) Page is more similar to true and thus is viewed as true.
2) Page is more similar to false and thus is viewed as false.
3) Page isn't more similar to either and thus the result is something like a null or exception situation as it isn't possible to discern which result makes sense.
Example of where that 3rd case could happen: Suppose the page contains an integer and if positive the result is true and if negative the result is false. What if the result is 0? Does 0 count as positive since it is equal to its absolute value or does it count as a negative for some reason?
Or am I way off in what you are trying to do here?
Upvotes: 1