Reputation: 459
Here is a simplified example:
library(tidyverse)
frame <- tribble(
~a, ~b, ~c,
1, 1, 2,
5, 4, 7,
2, 3, 4,
3, 1, 6
)
key <- tribble(
~col, ~name, ~type, ~labels,
1, "a", "f", c("one", "two", "three", "four", "five"),
2, "b", "f", c("uno", "dos", "tres", "cuatro"),
3, "c", "f", 1:7
)
Is there an elegant way of programmatically sweeping across the columns in frame
and applying the specific factor class, based on the parameters in key
? The expected result would be:
# A tibble: 4 x 3
a b c
<fctr> <fctr> <fctr>
1 one uno 2
2 five cuatro 7
3 two tres 4
4 three uno 6
The best solution I have so far is using purrr
's map2()
but with assignment that is IMO not the most elegant:
frame[key$col] <- map2(key$col, key$labels,
function(x, y) factor(frame[[x]], levels = 1:length(y), labels = y))
Does anyone have a more tidy solution? Note that my original data frame has hundreds of columns and I need to re-factor with different levels/labels a majority of them, so the process has to be automated.
Upvotes: 2
Views: 225
Reputation: 206
Here is another solution. I am not sure how "elegant" it is. Hopefully, someone can improve on that.
suppressPackageStartupMessages(library(tidyverse))
frame <- tribble(
~a, ~b, ~c,
1, 1, 2,
5, 4, 7,
2, 3, 4,
3, 1, 6
)
key <- tribble(
~col, ~name, ~type, ~labels,
1, "a", "f", c("one", "two", "three", "four", "five"),
2, "b", "f", c("uno", "dos", "tres", "cuatro"),
3, "c", "f", 1:7
)
colnames(frame) %>%
map(~ {
factor(pull(frame, .x),
levels = 1:length(pluck(key[key$name == .x, "labels"], 1, 1)),
labels = pluck(key[key$name == .x, "labels"], 1, 1))
}) %>%
set_names(colnames(frame)) %>%
as_tibble()
#> # A tibble: 4 x 3
#> a b c
#> <fctr> <fctr> <fctr>
#> 1 one uno 2
#> 2 five cuatro 7
#> 3 two tres 4
#> 4 three uno 6
Upvotes: 1
Reputation: 79288
For this question, you can use a base R code:
(A=`names<-`(data.frame(mapply(function(x,y)x[y],key$labels,frame)),key$name))
a b c
1 one uno 2
2 five cuatro 7
3 two tres 4
4 three uno 6
sapply(A,class)
a b c
"factor" "factor" "factor"
Upvotes: 0
Reputation: 4534
I'm interested to see what other solutions are proposed for this. My only suggestion is to change the proposed solution slightly so it is clearer that frame
is going to be modified in some way rather than leaving it in the body of the function used by map2
.
For example, pass frame
as an additional argument in the call to map2
:
frame[key$col] <- map2(key$col, key$labels,
function(x, y, z) factor(z[[x]], levels = 1:length(y), labels = y),
frame)
Or do the same thing using the pipe operator %>%
:
frame[key$col] <- frame %>%
{ map2(key$col, key$labels,
function(x, y, z) factor(z[[x]], levels = 1:length(y), labels = y), .) }
Upvotes: 0
Reputation: 10222
I don't know if this answer satisfies your requirements of being tidy as it uses a plain old for-loop. But it does the job and in my opinion is easy to read/understand as well as reasonably fast.
library(tidyverse)
frame <- tribble(
~a, ~b, ~c,
1, 1, 2,
5, 4, 7,
2, 3, 4,
3, 1, 6
)
key <- tribble(
~col, ~name, ~type, ~labels,
1, "a", "f", c("one", "two", "three", "four", "five"),
2, "b", "f", c("uno", "dos", "tres", "cuatro"),
3, "c", "f", 1:7
)
for (i in 1:nrow(key)) {
var <- key$name[[i]]
x <- frame[[var]]
labs <- key$labels[[i]]
lvls <- 1:max(length(x), length(labs)) # make sure to have the right lengths
frame <- frame %>% mutate(!! var := factor(x, levels = lvls, labels = labs))
}
frame
#> # A tibble: 4 x 3
#> a b c
#> <fctr> <fctr> <fctr>
#> 1 one uno 2
#> 2 five cuatro 7
#> 3 two tres 4
#> 4 three uno 6
The typical tidy-approach would be to reshape the data to have all variables in one column, then apply a function to that column, and finally reshaping it to the original format. However, factors don't really like that, thus we need to use other means. Are factors even considered tidy?
Regarding my assumption that the for-loop would be similar to the map2
-function, I was wrong.
Here are some benchmarks:
library(microbenchmark)
frame1 <- frame
frame2 <- frame
microbenchmark(
map2 = {
frame1[key$col] <- map2(key$col, key$labels,
function(x, y) factor(frame[[x]],
levels = 1:max(frame[[x]],
length(y)),
labels = y))
},
forloop = {
for (i in 1:nrow(key)) {
var <- key$name[[i]]
x <- frame2[[var]]
labs <- key$labels[[i]]
lvls <- 1:max(length(x), length(labs))
frame2 <- frame2 %>% mutate(!! var := factor(x, levels = lvls, labels = labs))
}
}
)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# map2 375.53 416.5805 514.3126 450.825 484.2175 3601.636 100 a
# forloop 11407.80 12110.0090 12816.6606 12564.176 13425.6840 16632.682 100 b
Upvotes: 0