Reputation: 1141
This is a very basic question but I cannot could not find enough reasons to convince myself. Why must logistic regression use multiplication instead of addition for the likelihood function l(w)?
Upvotes: 0
Views: 398
Reputation: 838
Your question is more general than just joint likelihood for logistic regression. You're asking why we multiply probabilities instead of add them to represent a joint probability distribution. Two notes:
This applies when we assume random variables are independent. Otherwise we need to calculate conditional probabilities using the chain rule of probability. You can look at wikipedia for more information.
We multiply because that's how the joint distribution is defined. Here is a simple example:
Say we have two probability distributions:
X = 1, 2, 3, each with probability 1/3
Y = 0 or 1, each with probability 1/2
We want to calculate the joint likelihood function, L(X=x,Y=y)
, which is that X
takes on values x
and Y
takes on values y
.
For example, L(X=1,Y=0) = P(X=1) * P(Y=0) = 1/6
. It wouldn't make sense to write P(X=1) + P(Y=0) = 1/3 + 1/2 = 5/6
.
Now it's true that in maximum likelihood estimation, we only care about those values of some parameter, theta, which maximizes the likelihood function. In this case, we know that if theta maximizes L(X=x,Y=y)
then the same theta will also maximize log L(X=x,Y=y)
. This is where you may have seen addition of probabilities come into play.
Hence we can take the log P(X=x,Y=y) = log P(X=x) + log P(Y=y)
In short
This could be summarized as "joint probabilities represent an AND". When X
and Y
are independent, P(X AND Y) = P(X,Y) = P(X)P(Y)
. Not to be confused with P(X OR Y) = P(X) + P(Y) - P(X,Y)
.
Let me know if this helps.
Upvotes: 1