Subsetting a data frame by a value of one of its colums

Question

I have a rather large data frame. Here is a simplified example:

Group Element Value Note
1     AAA     11    Good
1     ABA     12    Good
1     AVA     13    Good
2     CBA     14    Good
2     FDA     14    Good
3     JHA     16    Good
3     AHF     16    Good
3     AKF     17    Good

Here it is as a dput:

dat <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), Element = structure(c(1L, 
2L, 5L, 6L, 7L, 8L, 3L, 4L), .Label = c("AAA", "ABA", "AHF", 
"AKF", "AVA", "CBA", "FDA", "JHA"), class = "factor"), Value = c(11L, 
12L, 13L, 14L, 14L, 16L, 16L, 17L), Note = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Good", class = "factor")), .Names = c("Group", 
"Element", "Value", "Note"), class = "data.frame", row.names = c(NA, 
-8L))

I'm trying to separate it based on the group. so let's say

Group 1 will be a data frame:

Group Element Value Note
1     AAA     11    Good
1     ABA     12    Good
1     AVA     13    Good

Group 2:

2     CBA     14    Good
2     FDA     14    Good

and so on.

Rich Scriven · Accepted Answer

You can use split for this.

> dat
##   Group Element Value Note
## 1     1     AAA    11 Good
## 2     1     ABA    12 Good
## 3     1     AVA    13 Good
## 4     2     CBA    14 Good
## 5     2     FDA    14 Good
## 6     3     JHA    16 Good
## 7     3     AHF    16 Good
## 8     3     AKF    17 Good

> x <- split(dat, dat$Group)

Then you can access each individual data frame by group number with x[[1]], x[[2]], etc.
For example, here is group 2:

> x[[2]]  ## or x[2]
##   Group Element Value Note
## 4     2     CBA    14 Good
## 5     2     FDA    14 Good

ADD: Since you asked about it in the comments, you can write each individual data frame to file with write.csv and lapply. The invisible wrapper is simply to suppress the output of lapply

> invisible(lapply(seq(x), function(i){
      write.csv(x[[i]], file = paste0(i, ".csv"), row.names = FALSE)
  }))

We can see that the files were created by looking at list.files

> list.files(pattern = "^[0-9].csv")
## [1] "1.csv" "2.csv" "3.csv"

And we can see the data frame of the third group with read.csv

> read.csv("3.csv")
##   Group Element Value Note
## 1     3     JHA    16 Good
## 2     3     AHF    16 Good
## 3     3     AKF    17 Good

Subsetting a data frame by a value of one of its colums

Answers (2)

Related Questions