Arrebimbomalho
Arrebimbomalho

Reputation: 186

Automating a function to return an expression with math constants and unknowns

I am trying to build a transitions matrix from Panel data observations in order to obtain the ML estimators of a weighted transitions matrix. A key step is obtaining the individual likelihood function for individuals. Say you have the following data frame:

ID          Feature1  Feature2  Transition
120421006   10000        1         ab
120421006   12000        0         ba
120421006   10000        1         ab
123884392    3000        1         ab
123884392    2000        0         ba
908747738    1000        1         ab

The idea is to return, for each agent, the log-likelihood of his path. For agent 120421006, for instance, this boils down to (ignoring the initial term)

LL = log(exp(Yab)/1 + exp(Yab)) + log(exp(Yba) /(1 + exp(Yba))) + log(exp(Yab)/1 + exp(Yab))

i.e,

log(exp(Y_transition)/(1 + exp(Y_transition)))

where Y_transition = xFeature1 + yFeature2 for that transition, and x and y are unknowns.

For example, for individual 120421006, this would boil down to an expression with three elements, since he transitions thrice, and the function would return

LL = log(exp(10000x + 1y)/ 1 + exp(10000x + 1y)) +

log(exp(12000x + 0y)/ 1 + exp(12000x + 0y)) +

log(exp(10000x + 1y)/ 1 + exp(10000x + 1y))

And here's the catch: I need x and y to return as unknowns, since the objective is to obtain a sum over the likelihoods of all individuals in order to pass it to an ML estimator. How would you automate a function that returns this output for all IDs?

Many thanks in advance

Upvotes: 0

Views: 58

Answers (2)

Onyambu
Onyambu

Reputation: 79188

Create the function:

fun=function(x){
a=paste0("exp(",x[1],"*x","+",x[2],"*y)")
parse(text=paste("sum(",paste0("log(",a,"/(1+",a,"))"),")"))
}

by(test[2:3],test[,1],fun)

sum(log(exp(c(10000, 12000, 10000) * x + c(1, 0, 1) * y)/(1 + 
    exp(c(10000, 12000, 10000) * x + c(1, 0, 1) * y))))
-------------------------------------------------------------------- 
sum(log(exp(c(3000, 2000) * x + c(1, 0) * y)/(1 + exp(c(3000, 
    2000) * x + c(1, 0) * y))))
-------------------------------------------------------------------- 
sum(log(exp(1000 * x + 1 * y)/(1 + exp(1000 * x + 1 * y))))

taking an example of x=0 and y=3 we can solve this:

x=0
y=3
sapply(by(test[2:3],test[,1],fun),eval)
[1] -0.79032188 -0.74173453 -0.04858735

in your example above:

x=0
y=3
 log(exp(10000*x + 1*y)/ (1 + exp(10000*x + 1*y))) +#There should be paranthesis
  log(exp(12000*x + 0*y)/ (1 + exp(12000*x + 0*y))) + 
  log(exp(10000*x + 1*y)/( 1 + exp(10000*x + 1*y)))
[1] -0.7903219

To get what you need within the comments:

fun1=function(x){
    a=paste0("exp(",x[1],"*x","+",x[2],"*y)")
    paste("sum(",paste0("log(",a,"/(1+",a,"))"),")")
    }

paste(by(test[2:3],test[,1],fun1),collapse = "+")
1] "sum( log(exp(c(10000, 12000, 10000)*x+c(1, 0, 1)*y)/(1+exp(c(10000, 12000, 10000)*x+c(1, 0, 1)*y))) )+sum( log(exp(c(3000, 2000)*x+c(1, 0)*y)/(1+exp(c(3000, 2000)*x+c(1, 0)*y))) )+sum( log(exp(1000*x+1*y)/(1+exp(1000*x+1*y))) )"

But this doesnt make sense why you would group them and then sum all of them. That is same as just summing them without grouping them using the ID which would be simpler and faster

Upvotes: 1

Felipe Alvarenga
Felipe Alvarenga

Reputation: 2652

First you have to decide how flexible your function has to be. I am leaving it fairly rigid, but you can alter it at your flavor.

First, you have to input the initial guess parameters, which you will supply in the optimizer. Then, declare your data and variables to be used in your estimation.

Assuming you will always have only 2 variables (you can change it later)

y <- function(initial_param, data, features){

  x = initial_param[1]
  y = initial_param[2]

  F1 = data[, features[1]]
  F2 = data[, features[2]]

  LL = log(exp(F1 * x + F2 * y) / (1 + exp(F1 * x + F2 * y)))

  return(-sum(LL))
}

This function returns the sum of minus the log likelihood, given that most optimizers try to find the parameters at which your function reaches a minimum, by default.

To find your parameters just supply the below function with your likelihood function y, the initial parameters, data set and a vector with the names of your variables:

nlm(f = y,  initial_param = your_starting_guess, data = your_data,
                  features = c("name_of_first_feature", "name_of_second_feature"), iterlim=1000, hessian=F)

Upvotes: 1

Related Questions