Reputation: 2091
is there a smart way to select columns from a dataframe based on quantiles of columns sums? For example, only select columns from the dataframe whose column sum is in the first quantile. I can subset data based column sums and I can calculate quantiles of column sums, but is there a way good way to combine theses? Thanks.
# e.g. subset data - select columns whose column sums are less than 5
mydata <- mydata[,colSums(mydata) < 5]
# e.g create quantiles on colSums
mydata_cs <- colSums(mydata)
quart.mydata_cs <- quantile(mydata_cs,probs=seq(0,1, by=0.25))
Upvotes: 0
Views: 4893
Reputation: 15441
x <- c(1,2,3,4,5)
y <- c(4,6,9,2,9)
df <- data.frame(x,y)
q <- quantile(colSums(df),probs=seq(0,1, by=0.25))
df[,colSums(df) < q[2] ,drop=FALSE]
Upvotes: 1
Reputation: 3866
Using your mydata_cs
, the following should work
mydata.firstquart <- mydata[,mydata_cs < quantile(mydata_cs,0.25)]
Based on your first line of code, I'm assuming by "first quartile" you mean lowest quartile. If you want the highest quartile, just change that to
mydata.firstquart <- mydata[,mydata_cs > quantile(mydata_cs,0.75)]
You may also want to use <=
or >=
rather than <
and >
.
Upvotes: 3