adam mcbrinn
adam mcbrinn

Reputation: 17

Matlab Conditional probability from dataset

I have a Matrix M of 500x5 and I need to calculate conditional probability. I have discretised my data and then I have this code that currently only works with 3 variables rather than 5 but that's fine for now.

The code below already works out the number of times I get A=1, B=1 and C=1, the number of times we get A=2, B=1, C=1 etc.

data = M;

npatients=size(data,1)

asum=zeros(4,2,2)
prob=zeros(4,2,2)
for patient=1:npatients,
h=data(patient,1)
i=data(patient,2)
j=data(patient,3)
asum(h,i,j)=asum(h,i,j)+1
end
for h=1:4,
for i=1:2,
for j=1:2,
prob(h,i,j)=asum(h,i,j)/npatients
end
end
end

So I need code to sum over to get the number of time we get A=1 and B=1 (adding over all C) to find:

Prob(C=1 given A=1 and B=1) = P(A=1,B=1, C=1)/P( A=1, B=1). 

This is the rule strength of the first rule. I need to find out how to loop over A, B and C to get the rest and how to actually get this to work in Matlab. I don't know if its of any use but I have code to put each column into its own thing.:

dest = M(:,1); gen = M(:,2); age = M(:,3); year = M(:,4); dur = M(:,5);

So say dest is the consequent and gen and age are the antecedents how would I do this.

Below is the data of the first 10 patients as an example:

destination gender  age
       2    2   2
       2    2   2
       2    2   2
       2    2   2
       2    2   2
       2    1   1
       3    2   2
       2    2   2
       3    2   1
       3    2   1

Any help is appreciated and badly needed.

Upvotes: 1

Views: 4813

Answers (1)

Rash
Rash

Reputation: 4336

Sine your code didn't work by copy & paste, I changed it a little bit,

It's better if you define a function that calculates the probability for given data,

function p = prob(data)
n = size(data,1);
uniquedata = unique(data);
p = zeros(length(uniquedata),2);
p(:,2) = uniquedata;
for i = 1 : size(uniquedata,1)
    p(i,1) = sum(data == uniquedata(i)) / n;
end
end

Now in another script,

data =[3    2   91;
       3    2   86;
       3    2   90;
       3    2   85;
       3    2   86;
       3    1   77;
       4    2   88;
       3    2   90;
       4    2   79;
       4    2   77;
       4    1   65;
       3    1   60];
pdest = prob(data(:,1));
pgend = prob(data(:,2));
page = prob(data(:,3));

This will give,

page =

0.0833   60.0000
0.0833   65.0000
0.1667   77.0000
0.0833   79.0000
0.0833   85.0000
0.1667   86.0000
0.0833   88.0000
0.1667   90.0000
0.0833   91.0000

pgend =

0.2500    1.0000
0.7500    2.0000

pdest =

0.6667    3.0000
0.3333    4.0000

That will give the probabilities you've already calculated,

Note that the second column of prob is the valuse and the first column the probability.

When you want to calculate probabilities for des = 3 & gend = 2 you should create a new data set and call prob, for new data set use,

mapd2g3 = data(:,1) == 3 & data(:,2) == 2;
datad2g3 = data(mapd2g3,:)

 3     2    91
 3     2    86
 3     2    90
 3     2    85
 3     2    86
 3     2    90

paged2g3 = prob(datad2g3(:,3))

0.1667   85.0000
0.3333   86.0000
0.3333   90.0000
0.1667   91.0000

This is the prob(age|dest = 3 & gend = 2) .

You could even write a function to create the data sets.

Upvotes: 2

Related Questions