Reputation: 17
I have a Matrix M
of 500x5
and I need to calculate conditional probability. I have discretised my data and then I have this code that currently only works with 3
variables rather than 5
but that's fine for now.
The code below already works out the number of times I get A=1
, B=1
and C=1
, the number of times we get A=2
, B=1
, C=1
etc.
data = M;
npatients=size(data,1)
asum=zeros(4,2,2)
prob=zeros(4,2,2)
for patient=1:npatients,
h=data(patient,1)
i=data(patient,2)
j=data(patient,3)
asum(h,i,j)=asum(h,i,j)+1
end
for h=1:4,
for i=1:2,
for j=1:2,
prob(h,i,j)=asum(h,i,j)/npatients
end
end
end
So I need code to sum over to get the number of time we get A=1
and B=1
(adding over all C) to find:
Prob(C=1 given A=1 and B=1) = P(A=1,B=1, C=1)/P( A=1, B=1).
This is the rule strength of the first rule. I need to find out how to loop over A
, B
and C
to get the rest and how to actually get this to work in Matlab. I don't know if its of any use but I have code to put each column into its own thing.:
dest = M(:,1); gen = M(:,2); age = M(:,3); year = M(:,4); dur = M(:,5);
So say dest
is the consequent and gen
and age
are the antecedents how would I do this.
Below is the data of the first 10
patients as an example:
destination gender age
2 2 2
2 2 2
2 2 2
2 2 2
2 2 2
2 1 1
3 2 2
2 2 2
3 2 1
3 2 1
Any help is appreciated and badly needed.
Upvotes: 1
Views: 4813
Reputation: 4336
Sine your code didn't work by copy & paste, I changed it a little bit,
It's better if you define a function that calculates the probability for given data,
function p = prob(data)
n = size(data,1);
uniquedata = unique(data);
p = zeros(length(uniquedata),2);
p(:,2) = uniquedata;
for i = 1 : size(uniquedata,1)
p(i,1) = sum(data == uniquedata(i)) / n;
end
end
Now in another script,
data =[3 2 91;
3 2 86;
3 2 90;
3 2 85;
3 2 86;
3 1 77;
4 2 88;
3 2 90;
4 2 79;
4 2 77;
4 1 65;
3 1 60];
pdest = prob(data(:,1));
pgend = prob(data(:,2));
page = prob(data(:,3));
This will give,
page =
0.0833 60.0000
0.0833 65.0000
0.1667 77.0000
0.0833 79.0000
0.0833 85.0000
0.1667 86.0000
0.0833 88.0000
0.1667 90.0000
0.0833 91.0000
pgend =
0.2500 1.0000
0.7500 2.0000
pdest =
0.6667 3.0000
0.3333 4.0000
That will give the probabilities you've already calculated,
Note that the second column of prob
is the valuse and the first column the probability.
When you want to calculate probabilities for des = 3 & gend = 2
you should create a new data set and call prob
, for new data set use,
mapd2g3 = data(:,1) == 3 & data(:,2) == 2;
datad2g3 = data(mapd2g3,:)
3 2 91
3 2 86
3 2 90
3 2 85
3 2 86
3 2 90
paged2g3 = prob(datad2g3(:,3))
0.1667 85.0000
0.3333 86.0000
0.3333 90.0000
0.1667 91.0000
This is the prob(age|dest = 3 & gend = 2)
.
You could even write a function to create the data sets.
Upvotes: 2