Reputation: 1137
I would like to organise a data frame by the contents of three of its columns from a total of 6 columns (a minimal example of just the three below) and have each unique cluster of similarity (over those 3 columns) returned as a subsetted data frame structure inside a list. So I basically chop the dataframe up into smaller data frame and put into a list.
var1 <- "erg11"
var2 <- "cyp51"
df <- data.frame(primerID=c(1,2,3,2,4,3,2,1,1,1,2),geneName=c(var1,var1,var2,var1,var1,var2,var2,var2,var1,var2,var1),insertLength=c(111,111,81,81,81,111,102,111,81,81,102))
Given my old C background I tried nested for loops, subsetting the data frame when all three elements of the data frame were found in three lists e.g.,
Alist <- as.list(unique(df$primerID))
Blist <- as.list(unique(df$geneName))
Clist <- as.list(unique(df$insertLength))
uniqueCounter <- 1
uniqueList <- list()
for(i in 1:length(Alist)) {
for(k in 1:length(Blist)) {
for(n in 1:length(Clist)) {
indDF <- subset(df, df$primerID %in% Alist[i] & df$geneName %in% Blist[j] & df$insertLength %in% Clist[n])
if(nrow(indDF) > 0) {
uniqueList[uniqueCounter] <- indDF
uniqueCounter <- uniqueCounter + 1
}
}
}
}
However, this takes most of the night to run.
Thanks
Upvotes: 2
Views: 1139
Reputation: 73415
You can give a list of factors as grouping a variable so that their interaction is used for grouping. Since all your data frame columns are grouping variables, we can do split(df, df)
.
Optionally do split(df, df, drop = TRUE)
, which drops groups with no records / cases.
Just read that your real data frame has 6 columns, 3 of which are for grouping. Suppose the grouping columns are 1, 3, 4, we can use split(df, df[c(1, 3, 4)])
.
From ?split
:
Description:
‘split’ divides the data in the vector ‘x’ into the groups defined
by ‘f’. The replacement forms replace values corresponding to
such a division. ‘unsplit’ reverses the effect of ‘split’.
Arguments:
x: vector or data frame containing values to be divided into
groups.
f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
drop: logical indicating if levels that do not occur should be
dropped (if ‘f’ is a ‘factor’ or a list).
Upvotes: 3