Reputation: 6874
I would like to remove columns that have less than 150 from the column sum from my dataframe df1
My dataframe is
chr leftPos FLD0195 FLD0197 FLD0201 FLD0203 FLD0211 FLD0243
chr1 100260254 34 52 29 18 13 30
chr1 100735342 44 111 88 65 40 66
chr1 100805662 0 0 1 1 0 0
chr1 100839460 1 0 5 0 0 0
The formula I'm using is as below. It runs without error but df2 is exactly the same as df1
df2 <- df1[,(colSums(df1[,3:ncol(df1)]) > 100000),]
Upvotes: 1
Views: 824
Reputation: 92282
When running a Boolean expression on a k
columns subset you are receiving a k
size logical vector. When entered inside a n
size column data set, n-k
values from the beginning of the vector are being recycled (until it reaches the size of n
), thus wrong columns are being selected. In your case, the fix is simple, just add n-k
TRUE
values at the beginning of the logical vector (because you want to keep all the n-k
columns at the beginning)
df1[c(rep(TRUE, 2L), colSums(df1[3L:ncol(df1)]) > 150L)]
# chr leftPos FLD0197
# 1 chr1 100260254 52
# 2 chr1 100735342 111
# 3 chr1 100805662 0
# 4 chr1 100839460 0
Upvotes: 4