Filippo Marolla
Filippo Marolla

Reputation: 95

Subset data in a for loop - R

How can I do the following operation in a for loop?

x <- runif(30, 0, 1)
sub_1 <- x[1:10]
sub_2 <- x[11:20]
sub_3 <- x[21:30]

That is, I want to create three different objects, each one with a subset of the initial vector without writing many lines of code.

This is of course a simple case that I created to make things easier. In my real case I have a big dataset that I want to subset in a similar fashion and assign names. I ask for a for loop because I struggle understanding the logic behind for loops and I rarely succeed at make them work. So extra explanations are welcome.

Upvotes: 0

Views: 83

Answers (3)

DJack
DJack

Reputation: 4940

If you really want to use a for loop you could do:

for (i in 1:3) {
  start <- 1 + (10 * (i - 1))
  end <- 10 * i
  assign(paste0("sub_", i), x[start:end])
}      

But split is much more efficient:

sub <- split(x, rep(1:3, each = 10))

Outputs are identical except that split returns a list.

identical(sub[[1]], sub_1)
# [1] TRUE
identical(sub[[2]], sub_2)
# [1] TRUE
identical(sub[[3]], sub_3)
# [1] TRUE

Upvotes: 2

s_baldur
s_baldur

Reputation: 33488

To do this literally you could do something like:

for (i in 1:3) {
  assign(paste0("sub_", i), x[(10*i - 9):(10*i)])
}

But it really depends on your application which method is optimal.

Upvotes: 1

Parfait
Parfait

Reputation: 107632

Consider building a list of sub items and not a flood of similar variables in your global environment. Specifically, use lapply using seq for every 10 items. Then rename items accordingly:

seq_10 <- lapply(seq(1, length(x), 10), function(i) x[i:(i+9)])
names(seq_10) <- paste0("sub_", 1:length(seq_10))

seq_10$sub_1
# [1] 0.6091323 0.5677653 0.7335186 0.8586379 0.7416119 0.4835484 0.2038851 0.3027926 0.3422036 0.8959509

seq_10$sub_2
# [1] 0.001431539 0.679949988 0.764357517 0.988070806 0.381550391 0.251816226 0.221106522 0.111756309 0.038826363 0.625358723

seq_10$sub_3
# [1] 0.7057926 0.1263321 0.5020490 0.8753861 0.9165018 0.2342572 0.1488096 0.1639103 0.9840052 0.6850799

Alternatively, use split building a factor of groupings as @Harlan shows in this SO answer, yielding exactly the same as above solution:

split_10 <- split(x, ceiling(seq_along(x)/10))
names(split_10) <- paste0("sub_", 1:length(split_10))

all.equal(seq_10, split_10)
# [1] TRUE

identical(seq_10, split_10)
# [1] TRUE

Upvotes: 1

Related Questions