Downsampling using purrr. Unique identifier

Question

I was wanting to use purrr to group by a unique identifier and then downSample a factor variable using the caret package. Here is the code below:

out <- train %>% select(stream, HUC12) %>% 
  na.omit() %>% group_by(HUC12) %>% 
  nest %>% mutate(prop = map(data, ~downSample(.x, factor('stream'))))

Any help would be much appreciated. Here's some sample data.

train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE, 
                    prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))

StupidWolf · Accepted Answer

Generate data:

set.seed(100)
train <- data.frame(stream = factor(sample(x= 0:1, size = 100, replace = TRUE, 
                    prob = c(0.25,.75))), HUC12 = rep(c("a","b","c","d")))

Try something like this, because your downSample returns a data.frame, we can use the do function in dplyr to perform the downsampling.

library(dplyr)
down_train <- train %>% select(stream, HUC12) %>%  
na.omit() %>% group_by(HUC12) %>%  do(downSample(.,.$stream))

We can check:

down_train %>% count(HUC12,stream)

# A tibble: 8 x 3
# Groups:   HUC12 [4]
  HUC12 stream     n
     
1 a     0          1
2 a     1          1
3 b     0          4
4 b     1          4
5 c     0         11
6 c     1         11
7 d     0          8
8 d     1          8

And in the original data:

train %>% count(HUC12,stream)
# A tibble: 8 x 3
  HUC12 stream     n
     
1 a     0          1
2 a     1         24
3 b     0          4
4 b     1         21
5 c     0         11
6 c     1         14
7 d     0          8
8 d     1         17

Downsampling using purrr. Unique identifier

Answers (1)

Related Questions