Reputation: 13
I would like to know how to pass a user-defined function in a data.table.
I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:
The data (with a warning message):
library(data.table)
dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")),
rep(c("a", "a", "b", "b", "b"), 20))
colnames(dt) = c("grp1", "grp2", "Q1")
The code to calculate % respondents:
dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]
This produces what I need (thanks @Frank your help at Calculate % respondents by more than one group for a survey data):
grp1 grp2 V1
1: I A 55.55556
2: I B 62.50000
3: I C 62.50000
4: II A 62.50000
5: II B 55.55556
6: II C 62.50000
7: III A 50.00000
8: III B 62.50000
9: III C 66.66667
10: IV A 66.66667
11: IV B 62.50000
12: IV C 50.00000
What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;
test = function(question, groupA, groupB){
dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)]
}
test(question = "Q1", groupA = "grp1", groupB ="grp2")
However, this returns only the top row :
grp1 grp2 V1
1: I A 55.55556
I've read other items on Stack Overflow (e.g. Using data.table i and j arguments in functions) and tried other codes but I haven't been able to find a way to get it work.
I'm new to R and would very much appreciate any feedback you may have.
Upvotes: 1
Views: 550
Reputation: 31452
The issue is in the way you specify the by
argument. Also we can use keyby
instead of by
, to do the sorting in one step:
test = function(question, groupA, groupB){
dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100,
keyby = c(groupA, groupB)]
}
ans = test(question = "Q1", groupA = "grp1", groupB ="grp2")
# grp1 grp2 V1
# 1: I A 55.55556
# 2: I B 62.50000
# 3: I C 62.50000
# 4: II A 62.50000
# 5: II B 55.55556
# 6: II C 62.50000
# 7: III A 50.00000
# 8: III B 62.50000
# 9: III C 66.66667
# 10: IV A 66.66667
# 11: IV B 62.50000
# 12: IV C 50.00000
Upvotes: 1