Bernard
Bernard

Reputation: 67

Split a data frame by column using a list of vectors as the column index

I Have a data frame V

> V
          1        2        3        4        5        6        7        8        9       10
1  2.912543 2.570664 3.341646 3.225278 3.131639 3.052497 3.117737 3.429533 3.392248 2.847380
2  2.891564 2.698348 3.035995 2.898063 2.808887 2.850897 3.217016 2.826621 3.229053 2.698508
3  3.214684 2.644645 3.160234 2.923109 3.230461 2.961171 3.129343 3.024775 2.714332 3.324411
4  2.919603 3.023168 3.070867 2.994575 2.947305 2.964142 3.278173 3.131523 2.788786 3.239060
5  2.792197 3.316468 2.915747 3.155218 3.315128 2.759656 2.630333 3.232530 2.920433 3.016210
6  2.902794 3.294973 3.229803 3.351397 3.269347 2.609505 3.035035 2.919629 2.919356 2.649507
7  3.049518 3.107500 2.857238 3.331793 3.322184 2.904852 3.335267 3.215756 3.079802 3.102080
8  3.083056 3.281189 3.070641 2.848449 2.961288 2.683630 3.153762 3.119757 3.103300 3.189348
9  2.775359 3.057107 3.217315 3.388652 2.984062 3.395337 2.896535 3.284888 2.589920 2.882975
10 2.540940 2.844450 3.332348 2.767093 2.962410 2.957737 2.929318 3.080653 3.103251 3.315891

and a list of vectors ind. The combined length of the vectors are equal to the number of columns in the data frame.

> ind 
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5 6 7 8

[[3]]
[1]  9 10

how can I split the data frame V by column into multiple data frames using the vectors in the list ind as the column index to select the sub data frames and output the them into a list. Thus the output should look like:

[[1]]
          1        2        3
1  2.912543 2.570664 3.341646
2  2.891564 2.698348 3.035995
3  3.214684 2.644645 3.160234
4  2.919603 3.023168 3.070867
5  2.792197 3.316468 2.915747
6  2.902794 3.294973 3.229803
7  3.049518 3.107500 2.857238
8  3.083056 3.281189 3.070641
9  2.775359 3.057107 3.217315
10 2.540940 2.844450 3.332348

[[2]]
          4        5        6        7        8
1  3.225278 3.131639 3.052497 3.117737 3.429533
2  2.898063 2.808887 2.850897 3.217016 2.826621
3  2.923109 3.230461 2.961171 3.129343 3.024775
4  2.994575 2.947305 2.964142 3.278173 3.131523
5  3.155218 3.315128 2.759656 2.630333 3.232530
6  3.351397 3.269347 2.609505 3.035035 2.919629
7  3.331793 3.322184 2.904852 3.335267 3.215756
8  2.848449 2.961288 2.683630 3.153762 3.119757
9  3.388652 2.984062 3.395337 2.896535 3.284888
10 2.767093 2.962410 2.957737 2.929318 3.080653

[[3]]
          9       10
1  3.392248 2.847380
2  3.229053 2.698508
3  2.714332 3.324411
4  2.788786 3.239060
5  2.920433 3.016210
6  2.919356 2.649507
7  3.079802 3.102080
8  3.103300 3.189348
9  2.589920 2.882975
10 3.103251 3.315891

Upvotes: 1

Views: 81

Answers (3)

Florian
Florian

Reputation: 1258

You could also user tidyr. You can assign your columns by index or by column name:

library(tidyverse)

mtcars %>% 
  tidyr::nest(first_col = c(mpg, cyl)) %>% 
  tidyr::nest(second_col = c(disp, hp)) %>% 
  tidyr::nest(third_col = c(1:3))

Upvotes: 0

user2974951
user2974951

Reputation: 10375

Quite simply

> lapply(ind, function(x) V[,x])

[[1]]
         X1       X2       X3
1  2.912543 2.570664 3.341646
2  2.891564 2.698348 3.035995
3  3.214684 2.644645 3.160234
4  2.919603 3.023168 3.070867
5  2.792197 3.316468 2.915747
6  2.902794 3.294973 3.229803
7  3.049518 3.107500 2.857238
8  3.083056 3.281189 3.070641
9  2.775359 3.057107 3.217315
10 2.540940 2.844450 3.332348

[[2]]
         X4       X5       X6       X7       X8
1  3.225278 3.131639 3.052497 3.117737 3.429533
2  2.898063 2.808887 2.850897 3.217016 2.826621
3  2.923109 3.230461 2.961171 3.129343 3.024775
4  2.994575 2.947305 2.964142 3.278173 3.131523
5  3.155218 3.315128 2.759656 2.630333 3.232530
6  3.351397 3.269347 2.609505 3.035035 2.919629
7  3.331793 3.322184 2.904852 3.335267 3.215756
8  2.848449 2.961288 2.683630 3.153762 3.119757
9  3.388652 2.984062 3.395337 2.896535 3.284888
10 2.767093 2.962410 2.957737 2.929318 3.080653

[[3]]
         X9      X10
1  3.392248 2.847380
2  3.229053 2.698508
3  2.714332 3.324411
4  2.788786 3.239060
5  2.920433 3.016210
6  2.919356 2.649507
7  3.079802 3.102080
8  3.103300 3.189348
9  2.589920 2.882975
10 3.103251 3.315891

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388862

We can use split.default

split.default(V, rep(seq_along(ind), lengths(ind)))

#$`1`
#      1    2    3
#1  2.91 2.57 3.34
#2  2.89 2.70 3.04
#3  3.21 2.64 3.16
#4  2.92 3.02 3.07
#5  2.79 3.32 2.92
#...

#$`2`
#      4    5    6    7    8
#1  3.23 3.13 3.05 3.12 3.43
#2  2.90 2.81 2.85 3.22 2.83
#3  2.92 3.23 2.96 3.13 3.02
#4  2.99 2.95 2.96 3.28 3.13
#5  3.16 3.32 2.76 2.63 3.23
#...

#$`3`
#      9   10
#1  3.39 2.85
#2  3.23 2.70
#3  2.71 3.32
#4  2.79 3.24
#5  2.92 3.02
#....

Upvotes: 1

Related Questions