joran.g
joran.g

Reputation: 89

How do I change columns in the same order as in my order matrix, per row

I have a matrix where I stored the order of items of a questionnaire, where the first column contains the name of the item that is first shown, the second column has the second shown item, etc. Each row in this matrix represents a new questionnaire, with the same items but with the order randomized in a different order.

> order.matrix
     [,1]    [,2]    [,3]   
[1,] "Anger" "Happy" "Sad"  
[2,] "Happy" "Sad"   "Anger"
[3,] "Sad"   "Anger" "Happy"

I have stored the responses on the items in a dataframe:

> df.responses
  Anger Happy Sad
1     1     2   3
2     3     2   0
3     9     2   1

Now, I want to change the order of the responses in df.responses, such that they are analogue to the order of items in the order.matrix, for each row. (As a result, the column names of df.responses shouldn't be in the resulting df anymore.) The result in this example should look like this:

> df.result
  V1 V2 V3
1  1  2  3
2  2  0  3
3  1  9  2

How can/should I do this?

EDIT, due to comment: I want to replace the item names in order.matrix by its corresponding value in df.responses

Upvotes: 1

Views: 89

Answers (4)

koenniem
koenniem

Reputation: 576

A solution with purrr may be the following

df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
                  .y = lapply(seq_len(nrow(order)), function(i) order[i,]),
                  .f = ~ .x[.y])
do.call("rbind", df.result)

In this code, the .x and .y are lists of vectors, i.e. a list of the rows (following this post https://stackoverflow.com/a/6821395/11086911). The output of map2 is then aggregated to a matrix with do.call and rbind.

In case anyone is curious as to how this compares to the other solutions, here is a comparison.

library(microbenchmark)
library(purrr)
set.seed(42) # For reproducibility purposes

# Comparison with given data
order.matrix <- matrix(c("Anger", "Happy", "Sad", "Happy", "Sad","Anger", "Sad", "Anger", "Happy"),
                       ncol=3,
                       byrow=TRUE)

df.responses <- matrix(c(1, 2, 3, 3, 2, 0, 9, 2, 1),
                       ncol=3,
                       byrow=TRUE)
colnames(df.responses) <- c("Anger", "Happy", "Sad")

solForLoop <- function(order, responses) {
  df.result <- responses
  colnames(df.result) <- paste0("V", 1:ncol(responses))
  for (i in 1:nrow(order)) {
    df.result[i,] <- responses[i,order[i,]]  
  }
  df.result
}

solmApply <- function(order, responses) {
  t(mapply(FUN = function(x, y) x[y], 
           as.data.frame(t(responses)),
           as.data.frame(t(order)),
           USE.NAMES = F
  ))
 }

solPurrr <- function(order, responses) {
  df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
                    .y = lapply(seq_len(nrow(order)), function(i) order[i,]),
                    .f = ~ .x[.y])
  do.call("rbind", df.result)
}

microbenchmark::microbenchmark(
  solForLoop(order.matrix, df.responses),
  solmApply(order.matrix, df.responses),
  solPurrr(order.matrix, df.responses),
  times = 1000L,
  check = "equivalent"
)

# Unit: microseconds
#                                   expr     min      lq      mean   median       uq       max neval
# solForLoop(order.matrix, df.responses)   8.601  11.101  15.03331  15.9010  17.3020    62.002  1000
#  solmApply(order.matrix, df.responses) 313.801 346.701 380.32261 357.7510 374.2010 14322.900  1000
#   solPurrr(order.matrix, df.responses)  49.900  61.301  70.68950  70.7015  75.8015   190.700  1000

Given that the data is from a questionnaire, I will assume that every value in an order.matrix row can occur only once. For a matrix with the same 3 columns but 100 000 rows, we find that

# Comparison for big data
order.matrix.big <- as.matrix(sample_n(as.data.frame(order.matrix), 100000, replace = TRUE))
df.responses.big <- as.matrix(sample_n(as.data.frame(df.responses), 100000, replace = TRUE))

microbenchmark::microbenchmark(
    solForLoop(order.matrix.big, df.responses.big),
    solmApply(order.matrix.big, df.responses.big),
    solPurrr(order.matrix.big, df.responses.big),
    times = 100L,
    check = "equivalent"
)

# Unit: milliseconds
#                                           expr       min        lq      mean    median        uq       max neval
# solForLoop(order.matrix.big, df.responses.big)  110.2585  130.0916  163.3382  142.4249  167.7584  514.7262   100
#  solmApply(order.matrix.big, df.responses.big) 4669.8815 4866.6152 5232.1814 5160.2967 5385.5000 6568.1718   100
#   solPurrr(order.matrix.big, df.responses.big)  441.6195  502.0853  697.7207  669.4963  871.9122 1218.6721   100

So while map2 provides an interesting way of working for 'looping' over rows, in this case it is not as fast a simple for loop.

Upvotes: 1

dario
dario

Reputation: 6483

1.Create reproducible example

order.matrix <- matrix(c("Anger", "Happy", "Sad", "Happy", "Sad","Anger", "Sad", "Anger", "Happy"),
                       ncol=3,
                       byrow=TRUE)

df.responses <-matrix(c(1, 2, 3, 3, 2, 0, 9, 2, 1),
                        ncol=3,
                        byrow=TRUE)
colnames(df.responses) <- c("Anger", "Happy", "Sad")

2.Solution using base R:

result <- NULL
for (i in seq_along(order.matrix[, 1])) {
  result <- rbind(result, df.responses[i, order.matrix[i, ]])
}
colnames(result) <- c("V1", "V2", "V3")

        V1    V2  V3
[1,]     1     2   3
[2,]     2     0   3
[3,]     1     9   2

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101818

A base R option is to use mapply, i.e.,

df.result <- t(mapply(function(v,k) v[k], 
                      data.frame(t(df.responses)),
                      data.frame(t(order.matrix)),
                      USE.NAMES = F
                      )
               )

such that

> df.responses
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    0    3
[3,]    1    9    2

Upvotes: 1

Tensibai
Tensibai

Reputation: 15784

Using base R you could loop over the matrix rows and assign the values from your df.responses by selecting the column order by the matrix row values:

# copy df.responses so we won't grow an object in the loop
df.result <- df.responses
# Rename the columns as they won't be correct after
colnames(df.result) <- c("V1","V2","V3")

for (x in 1:nrow(order.matrix)) {
  # replace the line with the value ordered by the matrix line names
  df.result[x,] <- df.responses[x,order.matrix[x,]]  
}

Upvotes: 2

Related Questions