Reputation: 89
I have a matrix where I stored the order of items of a questionnaire, where the first column contains the name of the item that is first shown, the second column has the second shown item, etc. Each row in this matrix represents a new questionnaire, with the same items but with the order randomized in a different order.
> order.matrix
[,1] [,2] [,3]
[1,] "Anger" "Happy" "Sad"
[2,] "Happy" "Sad" "Anger"
[3,] "Sad" "Anger" "Happy"
I have stored the responses on the items in a dataframe:
> df.responses
Anger Happy Sad
1 1 2 3
2 3 2 0
3 9 2 1
Now, I want to change the order of the responses in df.responses
, such that they are analogue to the order of items in the order.matrix
, for each row. (As a result, the column names of df.responses
shouldn't be in the resulting df anymore.)
The result in this example should look like this:
> df.result
V1 V2 V3
1 1 2 3
2 2 0 3
3 1 9 2
How can/should I do this?
EDIT, due to comment: I want to replace the item names in order.matrix
by its corresponding value in df.responses
Upvotes: 1
Views: 89
Reputation: 576
A solution with purrr
may be the following
df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
.y = lapply(seq_len(nrow(order)), function(i) order[i,]),
.f = ~ .x[.y])
do.call("rbind", df.result)
In this code, the .x
and .y
are lists of vectors, i.e. a list of the rows (following this post https://stackoverflow.com/a/6821395/11086911). The output of map2
is then aggregated to a matrix with do.call
and rbind
.
In case anyone is curious as to how this compares to the other solutions, here is a comparison.
library(microbenchmark)
library(purrr)
set.seed(42) # For reproducibility purposes
# Comparison with given data
order.matrix <- matrix(c("Anger", "Happy", "Sad", "Happy", "Sad","Anger", "Sad", "Anger", "Happy"),
ncol=3,
byrow=TRUE)
df.responses <- matrix(c(1, 2, 3, 3, 2, 0, 9, 2, 1),
ncol=3,
byrow=TRUE)
colnames(df.responses) <- c("Anger", "Happy", "Sad")
solForLoop <- function(order, responses) {
df.result <- responses
colnames(df.result) <- paste0("V", 1:ncol(responses))
for (i in 1:nrow(order)) {
df.result[i,] <- responses[i,order[i,]]
}
df.result
}
solmApply <- function(order, responses) {
t(mapply(FUN = function(x, y) x[y],
as.data.frame(t(responses)),
as.data.frame(t(order)),
USE.NAMES = F
))
}
solPurrr <- function(order, responses) {
df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
.y = lapply(seq_len(nrow(order)), function(i) order[i,]),
.f = ~ .x[.y])
do.call("rbind", df.result)
}
microbenchmark::microbenchmark(
solForLoop(order.matrix, df.responses),
solmApply(order.matrix, df.responses),
solPurrr(order.matrix, df.responses),
times = 1000L,
check = "equivalent"
)
# Unit: microseconds
# expr min lq mean median uq max neval
# solForLoop(order.matrix, df.responses) 8.601 11.101 15.03331 15.9010 17.3020 62.002 1000
# solmApply(order.matrix, df.responses) 313.801 346.701 380.32261 357.7510 374.2010 14322.900 1000
# solPurrr(order.matrix, df.responses) 49.900 61.301 70.68950 70.7015 75.8015 190.700 1000
Given that the data is from a questionnaire, I will assume that every value in an order.matrix
row can occur only once. For a matrix with the same 3 columns but 100 000 rows, we find that
# Comparison for big data
order.matrix.big <- as.matrix(sample_n(as.data.frame(order.matrix), 100000, replace = TRUE))
df.responses.big <- as.matrix(sample_n(as.data.frame(df.responses), 100000, replace = TRUE))
microbenchmark::microbenchmark(
solForLoop(order.matrix.big, df.responses.big),
solmApply(order.matrix.big, df.responses.big),
solPurrr(order.matrix.big, df.responses.big),
times = 100L,
check = "equivalent"
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# solForLoop(order.matrix.big, df.responses.big) 110.2585 130.0916 163.3382 142.4249 167.7584 514.7262 100
# solmApply(order.matrix.big, df.responses.big) 4669.8815 4866.6152 5232.1814 5160.2967 5385.5000 6568.1718 100
# solPurrr(order.matrix.big, df.responses.big) 441.6195 502.0853 697.7207 669.4963 871.9122 1218.6721 100
So while map2
provides an interesting way of working for 'looping' over rows, in this case it is not as fast a simple for loop.
Upvotes: 1
Reputation: 6483
1.Create reproducible example
order.matrix <- matrix(c("Anger", "Happy", "Sad", "Happy", "Sad","Anger", "Sad", "Anger", "Happy"),
ncol=3,
byrow=TRUE)
df.responses <-matrix(c(1, 2, 3, 3, 2, 0, 9, 2, 1),
ncol=3,
byrow=TRUE)
colnames(df.responses) <- c("Anger", "Happy", "Sad")
2.Solution using base R
:
result <- NULL
for (i in seq_along(order.matrix[, 1])) {
result <- rbind(result, df.responses[i, order.matrix[i, ]])
}
colnames(result) <- c("V1", "V2", "V3")
V1 V2 V3
[1,] 1 2 3
[2,] 2 0 3
[3,] 1 9 2
Upvotes: 2
Reputation: 101818
A base R option is to use mapply
, i.e.,
df.result <- t(mapply(function(v,k) v[k],
data.frame(t(df.responses)),
data.frame(t(order.matrix)),
USE.NAMES = F
)
)
such that
> df.responses
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 0 3
[3,] 1 9 2
Upvotes: 1
Reputation: 15784
Using base R you could loop over the matrix rows and assign the values from your df.responses by selecting the column order by the matrix row values:
# copy df.responses so we won't grow an object in the loop
df.result <- df.responses
# Rename the columns as they won't be correct after
colnames(df.result) <- c("V1","V2","V3")
for (x in 1:nrow(order.matrix)) {
# replace the line with the value ordered by the matrix line names
df.result[x,] <- df.responses[x,order.matrix[x,]]
}
Upvotes: 2