Bruce Wayne
Bruce Wayne

Reputation: 491

R: Count number of rows per model excluding certain variables if present

I have a table which looks like this:

modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))

I want to count number of variables apart from intercept, month, ratediff in each model. My desired output is:

modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))

I tried to get a flag using:

modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1, 
                  function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))

but the grep(month) doesn't work.

modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1, 
                  function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))

This works, but month followed by the suffix isn't captured.

I want something equivalent to ~ilike~ from sql on the variables intercept, month and RateDiff, as I do not want it to be case sensitive and want to allow for suffix and prefix on the variables. How could I achieve this?

Upvotes: 1

Views: 155

Answers (1)

Shree
Shree

Reputation: 11150

Here's one way with dplyr -

modelsummary %>% 
  mutate(
    variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>% 
      n_distinct()
  )

          term mod_id variables
1  (Intercept)      1         3
2       month1      1         3
3       month2      1         3
4     RateDiff      1         3
5         var1      1         3
6         var2      1         3
7         var3      1         3
8  (Intercept)      2         3
9       month1      2         3
10        var1      2         3
11        var2      2         3
12        var3      2         3

Or with dplyr and stringr:

modelsummary %>%
  mutate(
    variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>% 
      n_distinct()
  )

Add group_by(mod_id) before the mutate if you want to count the number of variables per mod_id.

In base R -

modelsummary$variables <- with(modelsummary, 
               term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>% 
               unique() %>% length()
               )

Upvotes: 2

Related Questions