nofunsally
nofunsally

Reputation: 2091

Subset data (columns) based on quantiles of column sums

is there a smart way to select columns from a dataframe based on quantiles of columns sums? For example, only select columns from the dataframe whose column sum is in the first quantile. I can subset data based column sums and I can calculate quantiles of column sums, but is there a way good way to combine theses? Thanks.

# e.g. subset data - select columns whose column sums are less than 5
mydata <- mydata[,colSums(mydata) < 5]

# e.g create quantiles on colSums
mydata_cs <- colSums(mydata)
quart.mydata_cs <- quantile(mydata_cs,probs=seq(0,1, by=0.25))

Upvotes: 0

Views: 4893

Answers (2)

user1317221_G
user1317221_G

Reputation: 15441

 x <- c(1,2,3,4,5)
 y <- c(4,6,9,2,9)
 df <- data.frame(x,y)
 q <-  quantile(colSums(df),probs=seq(0,1, by=0.25))
 df[,colSums(df) < q[2] ,drop=FALSE]

Upvotes: 1

Jonathan Christensen
Jonathan Christensen

Reputation: 3866

Using your mydata_cs, the following should work

mydata.firstquart <- mydata[,mydata_cs < quantile(mydata_cs,0.25)]

Based on your first line of code, I'm assuming by "first quartile" you mean lowest quartile. If you want the highest quartile, just change that to

mydata.firstquart <- mydata[,mydata_cs > quantile(mydata_cs,0.75)]

You may also want to use <= or >= rather than < and >.

Upvotes: 3

Related Questions