Reputation: 12363
Hey I have a task to perform, which is basically to somehow retrieve powerpoint presentations or pdf documents pertaining to a certain field. Let's say I want to retrieve ppt and pdf lecture notes pertaining to bioinformatics field. I would like to know if this task can be achieved by adapting the approach of using neural bots trained by a neural network? Just wanted to confirm that this approach is not completely wrong before I proceeded further with my implementation.
And in case someone is wondering why a neural network or any learning algorithm at all is required in this case well here is my plan (which might be wrong or there might be an easier way to achieve this so please feel free to correct me):
I generate neural bots trained by a neural network (not sure how this training happens yet, I am assuming by supervised learning using a sample training set of certain ppt and pdf files) and then these bots retrieve pages that are similar to what they learnt through their training.
So is the above approach a correct way to go about completing this task?
Upvotes: 2
Views: 684
Reputation: 9290
Neural nets are complicated. It seems like you have a generic document classification problem. The simplest place to start is using some kind of naive bayes model with bag of word features. The next step I'd take from there is to use a linear SVM or logistic regression on the same feature set. If you still don't have the performance you want after you tried simpler things, maybe then go on to try using neural nets.
Just like you wouldn't say, I want to do write an email server, I'll start by writing an operating system, I'd tend to be wary of using neural nets before simpler things have failed.
Upvotes: 8