candelas762
candelas762

Reputation: 157

Can I use neuronal networks from the R package nnet to model compositional data?

I am working with compositional data (proportions that sum up to 1) in R and I have already tried Dirichlet regressions as a parametric approach for modelling and predict on new data. I would like to test non-parametric approaches such as neural networks, random forest or CNN and I found the package nnet with the function multinom(), which in the manual says that "Fits multinomial log-linear models via neural networks". However, I also found in another question that this function is not really calling any neural network whatsoever since the argument size (size of the hidden layer) is set to 0, which converts this function into a parametric approach if I understood correctly.

In the same question, it was suggested to just use nnet()function with a response with more than 2 levels, and set softmax=TRUE, but I dont know how to do this and I get an error if I try to model directly the proportions (see reproducible example).

Given an reproducble example dataset like this one:

set.seed(123)
n <- 300 # Number of observations

# Simulate predictor variables
data <- as.data.frame(matrix(runif(n * 10), nrow = n, ncol = 10))
names(data) <- paste0("V", 1:10)

# Simulate proportions for A, B, C that sum to 1
proportions <- matrix(runif(n * 3), ncol = 3)
row_sums <- rowSums(proportions)
proportions <- sweep(proportions, 1, row_sums, "/")
data$prop_A <- proportions[,1]
data$prop_B <- proportions[,2]
data$prop_C <- proportions[,3]

# Split into train/test
train_indices <- sample(1:n, size = 0.8 * n)
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]

# Fit nnet model with softmax=TRUE
model <- nnet(prop_A + prop_B + prop_C ~., data = train_data, size = 2, softmax = T)
>
> Error in nnet.default(x, y, w, ...) : 
>   'softmax = TRUE' requires at least two response categories
>

I would like to know if nnet is a plausible non-parametric approach to model compositinoal data and how would an experienced person takkle this problem.

Upvotes: 1

Views: 71

Answers (2)

Michail
Michail

Reputation: 11

Yes you can. This is already implemented in the package Compositional. The function is called kl.compreg().

Upvotes: 1

Max
Max

Reputation: 81

I think you could do something like this with cito. cito is a package for fitting DNNs with a simple user interface (including formula syntax), but with the advantage of using state-of-the-art DL frameworks (here torch).

Unlike nnet, cito supports custom loss functions and I think we can build what you want (unfortunately we do not have multinomial loss at the moment because it is not yet implemented in torch):

library(cito)

# Custom loss functions must be implemented in torch 
custom_loss = function(pred, true) {
  pred_prob = torch::nnf_softmax(pred, dim = 2)
  return(torch::nnf_binary_cross_entropy(pred_prob, true))
}

# Model call, by default a DNN with two layers and each with 50 nodes
model = dnn(cbind(prop_A, prop_B, prop_C)~., data = data, loss = custom_loss, lr = 0.1)

# Predict
pred =predict(model)

# To forward them through the softmax link, we must transfer them to a torch object
pred_tensor = torch::torch_tensor(pred)

# Predictions - we must apply the link, and then transform it back to R
torch::nnf_softmax(pred_tensor, dim = 2)  |> as.matrix()

Upvotes: 1

Related Questions