machine-learningdata-miningexpert-system

Reputation: 319

What Machine Learning Algorithm would be appropriate for this scenario

I have a PHP/MySQL application that stores symptoms and the appropriate drug. What machine learning algorithm should I use to predict the drug for any symptoms. Also, what would be the format of the training set?

Upvotes: 2

Answers (7)

Bruce Truth

Reputation: 45

Try the K nearest neighbors, I think that's a classification problem. your prescription might need more than one drug or more, and the other problem is that the machine might not always be accurate as it will be given chance to decide on that were not trained for it. you need a very detailed dataset.

The below example is Bases on ml-idea (machine learning idea) Github- ML-Idea

though there is no perfect algorithm, just prepare your data correctly as good data counts.

 //symptoms
  //1 = 'Symptom 1';
  //2 = 'Symptom 2';
  //3 = 'Symptom 3';
  //4 = 'Symptom 4';

$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['drug a', 'drug A', 'drug x', 'drug x', 'drug a', 'drug x'];

$classifier = new KNearestNeighbors(6, true);
$classifier->train($samples, $labels);
$data = $classifier->predict([2, 1]);

echo "<pre>";
print_r($data);
echo "</pre>";

Upvotes: 0

Neil McGuigan

Reputation: 48287

well your data will end up looking like this:

row_id  symptom_x symptom_y degree_of_symptom_z ... best_drug

1       false     true      0.8                     drug_x
2       true      null      0.0                     drug_q

And you will use a statistical classifier to learn the best drug based on the symptoms. Then you will feed it new symptoms and it will indicate the best drug.

There will probably be lots of available symptoms, so the algo needs to be able to handle many columns.

I would start with Support Vector Machine, and also try Logistic Regression.

Check out RapidMiner.

Upvotes: 2

user1883737

Reputation: 263

this is a classification problem: you have labelled data that you want to use to train a model.

As you are going to have some errors you should decide if minimise your false positive or your false negatives results and balance your algorithm to achieve that.

You can use a simple decision three and see how the performance are, using a test set like some real prescriptions form doctors.

Note that your prescription might need more than one drug or none.

One problem you should consider is that if you take some drugs you can't take others, and the patient can have some allergies. For that reason I would suggest you to have a look at http://en.wikipedia.org/wiki/Association_rule_learning and Prolog.

Upvotes: 0

Has QUIT--Anony-Mousse

Reputation: 77495

You will need to try hundreds of algorithms, preprocessings etc. yourself.

There is no general "best algorithm" for anything.

In particular not for data-driven things, when other don't have your data.

So, try out a number of things, to see what works for you. Because what works for other must not necessarily work for you, and the other way round.

Also, experience and expertise a must in order to get good results.

Upvotes: 0

Pedro Cordeiro

Reputation: 2125

Seeing you will probably have a lot of unknown variables for this problem, I'd suggest approaching it using bayesian networks.

That would be just a guess based on that brief description and previous experience with medical diagnosis software (such as WebMD and others).

Bayesian networks tend to have higher "precision" when dealing with lots of unknown variables than most other ML algorithms (neural networks, for example, tend to need more accurate data in order to make an accurate regression - and therefore make accurate suggestions).

You'd need to do some research on overfitting prevention, smoothing and other issues you might encounter.

Again, this is not a definitive answer. You did not provide any detailed data for me to make a guess based on anything more than assumptions. I'd strongly suggest researching more deeply before deciding.

Upvotes: 0

Eamon Nerbonne

Reputation: 48146

I think your best bet is to identify a solid library that integrates well in your environment.

In general:

Good data helps almost always: i.e. preprocess your data to extract features ("summaries") that you think would be useful to a human too.
Avoid useless features: prefer few good features over many tricky ones that might help slightly.
Be aware that there is unlikely to be a magic black box: you'll need to tune your algorithm. Most ML algorithms have several so called "hyperparameters" that affect how the algorithm works; e.g. learning rate; smoothing; window size etc.
Since it's not a black box, find some Machine Learning introduction and get at least a basic understanding of how and why these techniques work. It's easy to get complete nonsense from an ML algorithm, so it's important to have at least some idea of how these things work so you can set up your problem appropriately.
Try something really simple first, like nearest-neighbor (you'll need a distance metric). It's possibly enough.

Though I haven't used one recently, I believe SVM's are still likely to be your best bet if NN isn't good enough. It's not the hip new thing, but they're usually pretty good without too much tuning. But it's almost always better to use a well-tuned weak algorithm (i.e. one with docs you understand and an implementation where you can try lots of hyperparam variations) than a poorly-tuned strong algorithm. Certainly if you don't really know what you're doing.

In other words: keep it simple, and make sure you use lots of common sense in feature selection phase.

Upvotes: 1

xhudik

Reputation: 2442

in ML there is no "best solution" for this scenario, there is almost always does this method/data satisfy my needs? So, try it with simple ML technique (e.g. decision trees), if it won't work, try something more sophisticated. If it won't work try to change data,...

Upvotes: 2

What Machine Learning Algorithm would be appropriate for this scenario

Answers (7)

Related Questions