Reputation: 319
I have a PHP/MySQL application that stores symptoms and the appropriate drug. What machine learning algorithm should I use to predict the drug for any symptoms. Also, what would be the format of the training set?
Upvotes: 2
Views: 820
Reputation: 45
Try the K nearest neighbors, I think that's a classification problem. your prescription might need more than one drug or more, and the other problem is that the machine might not always be accurate as it will be given chance to decide on that were not trained for it. you need a very detailed dataset.
The below example is Bases on ml-idea (machine learning idea) Github- ML-Idea
though there is no perfect algorithm, just prepare your data correctly as good data counts.
`
//symptoms
//1 = 'Symptom 1';
//2 = 'Symptom 2';
//3 = 'Symptom 3';
//4 = 'Symptom 4';
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['drug a', 'drug A', 'drug x', 'drug x', 'drug a', 'drug x'];
$classifier = new KNearestNeighbors(6, true);
$classifier->train($samples, $labels);
$data = $classifier->predict([2, 1]);
echo "<pre>";
print_r($data);
echo "</pre>";
`
Upvotes: 0
Reputation: 48287
well your data will end up looking like this:
row_id symptom_x symptom_y degree_of_symptom_z ... best_drug
1 false true 0.8 drug_x
2 true null 0.0 drug_q
And you will use a statistical classifier to learn the best drug based on the symptoms. Then you will feed it new symptoms and it will indicate the best drug.
There will probably be lots of available symptoms, so the algo needs to be able to handle many columns.
I would start with Support Vector Machine, and also try Logistic Regression.
Check out RapidMiner.
Upvotes: 2
Reputation: 263
this is a classification problem: you have labelled data that you want to use to train a model.
As you are going to have some errors you should decide if minimise your false positive or your false negatives results and balance your algorithm to achieve that.
You can use a simple decision three and see how the performance are, using a test set like some real prescriptions form doctors.
Note that your prescription might need more than one drug or none.
One problem you should consider is that if you take some drugs you can't take others, and the patient can have some allergies. For that reason I would suggest you to have a look at http://en.wikipedia.org/wiki/Association_rule_learning and Prolog.
Upvotes: 0
Reputation: 77495
You will need to try hundreds of algorithms, preprocessings etc. yourself.
There is no general "best algorithm" for anything.
In particular not for data-driven things, when other don't have your data.
So, try out a number of things, to see what works for you. Because what works for other must not necessarily work for you, and the other way round.
Also, experience and expertise a must in order to get good results.
Upvotes: 0
Reputation: 2125
Seeing you will probably have a lot of unknown variables for this problem, I'd suggest approaching it using bayesian networks.
That would be just a guess based on that brief description and previous experience with medical diagnosis software (such as WebMD and others).
Bayesian networks tend to have higher "precision" when dealing with lots of unknown variables than most other ML algorithms (neural networks, for example, tend to need more accurate data in order to make an accurate regression - and therefore make accurate suggestions).
You'd need to do some research on overfitting prevention, smoothing and other issues you might encounter.
Again, this is not a definitive answer. You did not provide any detailed data for me to make a guess based on anything more than assumptions. I'd strongly suggest researching more deeply before deciding.
Upvotes: 0
Reputation: 48146
I think your best bet is to identify a solid library that integrates well in your environment.
In general:
Though I haven't used one recently, I believe SVM's are still likely to be your best bet if NN isn't good enough. It's not the hip new thing, but they're usually pretty good without too much tuning. But it's almost always better to use a well-tuned weak algorithm (i.e. one with docs you understand and an implementation where you can try lots of hyperparam variations) than a poorly-tuned strong algorithm. Certainly if you don't really know what you're doing.
In other words: keep it simple, and make sure you use lots of common sense in feature selection phase.
Upvotes: 1
Reputation: 2442
in ML there is no "best solution" for this scenario, there is almost always does this method/data satisfy my needs? So, try it with simple ML technique (e.g. decision trees), if it won't work, try something more sophisticated. If it won't work try to change data,...
Upvotes: 2