Oliver
Oliver

Reputation: 8572

Grouped matching

In a specific problem, I am trying to match a number/string to various subgroups.

set.seed(1)
n <- 1e5
groups <- list(0, 1:3, 5:9, 10:33,  
               35, 36:39, 41:43,
               45:47, 49:53,
               55:56, 58:63,
               64:65, 68, 69:75, 
               77:82, 84, 85,
               86:89, 90:93,
               94:96, 97:98, 99)
dat <- sample(unlist(groups), n, TRUE)

Thus i want to know which group dat is contained within, in 'groups'. One method would be using *apply or the equivalent for-loop

out <- integer(n)
for(i in seq_along(out))
  out[i] <- which(sapply(groups, function(x)dat[i] %in% x))
table(out)
#out
#    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22 
# 1144  3384  5587 26892  1165  4501  3348  3299  5702  2177  6751  2218  1134  7810  6792  1106  1091  4526  4606  3370  2246  1151

but is there a more concise method?
Note the final result is out, and the table is only for matching visualization. Eg. the final result should match out and not table(out).

Upvotes: 2

Views: 103

Answers (2)

akrun
akrun

Reputation: 887118

An option is also to stack or enframe into a single dataset and then do the count

library(dplyr)
stack(setNames(groups, seq_along(groups))) %>%
     group_by(ind) %>% 
     summarise(Count = sum(dat %in% values))
# A tibble: 22 x 2
#   ind   Count
#   <fct> <int>
# 1 1      1144
# 2 2      3384
# 3 3      5587
# 4 4     26892
# 5 5      1165
# 6 6      4501
# 7 7      3348
# 8 8      3299
# 9 9      5702
#10 10     2177
# … with 12 more rows

Or with enframe

library(tibble)
library(tidyr)
enframe(groups) %>%
    unnest(c(value)) %>%
    group_by(name) %>%
    summarise(Count = sum(dat %in% value))

Upvotes: 1

GKi
GKi

Reputation: 39657

There is no need for the for loop. To get the counts you can use dat direct.

i <- lapply(groups, function(x) which(dat %in% x))
out[unlist(i)] <- rep(seq_along(i), lengths(i))
table(out)
#out
#    1     2     3     4     5     6     7     8     9    10    11    12    13 
# 1144  3384  5587 26892  1165  4501  3348  3299  5702  2177  6751  2218  1134 
#   14    15    16    17    18    19    20    21    22 
# 7810  6792  1106  1091  4526  4606  3370  2246  1151 

You can also use a lookup table:

lup <- numeric(0)
lup[unlist(groups)+1] <- rep(seq_along(groups), lengths(groups))
out <- lup[dat+1]

Upvotes: 3

Related Questions