Reputation: 491
I have a table which looks like this:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))
I want to count number of variables apart from intercept, month, ratediff in each model. My desired output is:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))
I tried to get a flag using:
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))
but the grep(month)
doesn't work.
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))
This works, but month followed by the suffix isn't captured.
I want something equivalent to ~ilike~ from sql on the variables intercept, month and RateDiff, as I do not want it to be case sensitive and want to allow for suffix and prefix on the variables. How could I achieve this?
Upvotes: 1
Views: 155
Reputation: 11150
Here's one way with dplyr
-
modelsummary %>%
mutate(
variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
n_distinct()
)
term mod_id variables
1 (Intercept) 1 3
2 month1 1 3
3 month2 1 3
4 RateDiff 1 3
5 var1 1 3
6 var2 1 3
7 var3 1 3
8 (Intercept) 2 3
9 month1 2 3
10 var1 2 3
11 var2 2 3
12 var3 2 3
Or with dplyr
and stringr
:
modelsummary %>%
mutate(
variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>%
n_distinct()
)
Add group_by(mod_id)
before the mutate
if you want to count the number of variables per mod_id
.
In base R -
modelsummary$variables <- with(modelsummary,
term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
unique() %>% length()
)
Upvotes: 2