Anthony Nash
Anthony Nash

Reputation: 1137

How to group in R and return as a list?

I would like to organise a data frame by the contents of three of its columns from a total of 6 columns (a minimal example of just the three below) and have each unique cluster of similarity (over those 3 columns) returned as a subsetted data frame structure inside a list. So I basically chop the dataframe up into smaller data frame and put into a list.

var1 <- "erg11"
var2 <- "cyp51"
df <- data.frame(primerID=c(1,2,3,2,4,3,2,1,1,1,2),geneName=c(var1,var1,var2,var1,var1,var2,var2,var2,var1,var2,var1),insertLength=c(111,111,81,81,81,111,102,111,81,81,102))

Given my old C background I tried nested for loops, subsetting the data frame when all three elements of the data frame were found in three lists e.g.,

Alist <- as.list(unique(df$primerID))
Blist <- as.list(unique(df$geneName))
Clist <- as.list(unique(df$insertLength))

uniqueCounter <- 1
uniqueList <- list()

for(i in 1:length(Alist)) {
  for(k in 1:length(Blist)) {
    for(n in 1:length(Clist)) {
      indDF <- subset(df, df$primerID %in% Alist[i] & df$geneName %in% Blist[j] & df$insertLength %in% Clist[n])
      if(nrow(indDF) > 0) {
        uniqueList[uniqueCounter] <- indDF
        uniqueCounter <- uniqueCounter + 1
      }
    }
  }
}

However, this takes most of the night to run.

Thanks

Upvotes: 2

Views: 1139

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73415

You can give a list of factors as grouping a variable so that their interaction is used for grouping. Since all your data frame columns are grouping variables, we can do split(df, df).

Optionally do split(df, df, drop = TRUE), which drops groups with no records / cases.

Just read that your real data frame has 6 columns, 3 of which are for grouping. Suppose the grouping columns are 1, 3, 4, we can use split(df, df[c(1, 3, 4)]).


From ?split:

Description:

 ‘split’ divides the data in the vector ‘x’ into the groups defined
 by ‘f’.  The replacement forms replace values corresponding to
 such a division.  ‘unsplit’ reverses the effect of ‘split’.

Arguments:

   x: vector or data frame containing values to be divided into
      groups.

   f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
      grouping, or a list of such factors in which case their
      interaction is used for the grouping.

drop: logical indicating if levels that do not occur should be
      dropped (if ‘f’ is a ‘factor’ or a list).

Upvotes: 3

Related Questions