Reputation: 17418
This code:
tips <- data.frame(
gender = c("female", "male", "male")
,smoker = c("yes", "no", "no")
)
tblFun <- function(x) {
tbl <- table(x)
res <- cbind(tbl, round(prop.table(tbl) * 100, 2))
colnames(res) <- c('Count', 'Percentage')
res
}
do.call(rbind, lapply(tips[1:2], tblFun))
produces this:
Count Percentage
female 1 33.33
male 2 66.67
no 2 66.67
yes 1 33.33
which is great. However, I would like to produce this:
key_value_pair Count Percentage
gender=female 1 33.33
gender=male 2 66.67
smoker=no 2 66.67
smoker=yes 1 33.33
Could someone please be so kind and suggest a solution? Thanks!
Upvotes: 0
Views: 91
Reputation: 160607
One way is to expand your tblFun
function to accept the category name and prepend it to the labels.
tblFun <- function(x, nm = character(0)) {
tbl <- table(x)
if (length(nm)) names(tbl) <- paste(nm[[1]], names(tbl), sep = "=")
res <- cbind(tbl, round(prop.table(tbl) * 100, 2))
colnames(res) <- c('Count', 'Percentage')
res
}
Given no change, it behaves as before:
do.call(rbind, lapply(tips[1:2], tblFun))
# Count Percentage
# female 1 33.33
# male 2 66.67
# no 2 66.67
# yes 1 33.33
In order to pass each column's name with each column, you need to use the multi-argument version of lapply
, Map
:
do.call(rbind, Map(tblFun, tips[1:2], names(tips[1:2])))
# Count Percentage
# gender=female 1 33.33
# gender=male 2 66.67
# smoker=no 2 66.67
# smoker=yes 1 33.33
An alternative is to use purrr::imap
, which passes both the object and its name (as a second argument) to the function:
do.call(rbind, purrr::imap(tips[1:2], tblFun))
# Count Percentage
# gender=female 1 33.33
# gender=male 2 66.67
# smoker=no 2 66.67
# smoker=yes 1 33.33
One advantage I see is no need to include both tips[1:2]
and names(tips[1:2])
, though if you aren't already using purrr
or tidyverse-packages, then adding another package just for this might not be desired (esp when Map
does the same thing with the explicit names()
).
As a brief demonstration of what Map
is doing: it "zips" arguments together.
As a comparison, lapply
(and family) run a function once for each element of its input vector/list. So lapply(1:3, myfunc)
"unrolls" to
list(
myfunc(1),
myfunc(2),
myfunc(3)
)
If you try to provide multiple vectors, however, it does not perform as one "might" want/think: lapply(1:3, myfunc, 11:13)
unrolls to:
list(
myfunc(1, 11:13),
myfunc(2, 11:13),
myfunc(3, 11:13)
)
Map
does it for an arbitrary number of vectors/lists, so Map(myfunc, 1:3, 11:13, 21:23, 99)
unrolls to
list(
myfunc(1, 11, 21, 99),
myfunc(2, 12, 22, 99),
myfunc(3, 13, 23, 99)
)
(Notice how length-one vectors are recycled. Though it does do recycling of lengths between 1 and the length of the longest vector, I don't recommend relying on it unless you strictly control the fact that the shorter vectors should multiply out to the length of the longest with no remainder.)
myfunc
in this case must accept (at least) three arguments. Two notable differences between lapply
and Map
:
lapply
puts data-first, function-second; because Map
accepts one or more vectors/lists, it puts the function-first, one-or-more data second+;Map
will recycle a singular list
argument, so Map(myfunc, 1:3, list(11:13)
unrolls to list(myfunc(1, 11:13), myfunc(2, 11:13), myfunc(3, 11:13))
, which on the surface looks very similar to lapply(1:3, myfunc, 11:13)
but can be handy when you have more than two vectors of inputs.Map
with do.call
, as in
l <- list(1:3, 11:13, 21:21)
do.call("Map", c(f = myfunc, l))
(as long as myfunc
accepts arbitrary number of arguments, likely via ...
mechanics). The first and only named argument of Map
is f=
for the function; it isn't required to name it here, but I like to for clarity.lapply
has the optionally-simplifying version sapply
, Map
has the optionally-simplifying version mapply
. I tend to prefer explicit -- there's nothing more frustrating than expecting vector output but one input causes the output to be a list
...Upvotes: 2
Reputation: 10771
I would use the tidyverse
and some data manipulation:
library(tidyverse)
tips %>%
gather(key_value, value) %>% # wide to long
count(key_value, value) %>%
group_by(key_value) %>%
mutate(percentage = n / sum(n)) %>%
unite(key_value_pair, key_value, value, sep = "=") # convert 2 cols into 1
# key_value_pair n percentage
# <chr> <int> <dbl>
# 1 gender=female 1 0.333
# 2 gender=male 2 0.667
# 3 smoker=no 2 0.667
# 4 smoker=yes 1 0.333
Upvotes: 3