Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to remove columns based on colSum

I would like to remove columns that have less than 150 from the column sum from my dataframe df1

My dataframe is

chr      leftPos    FLD0195  FLD0197 FLD0201 FLD0203 FLD0211    FLD0243
chr1    100260254       34    52       29        18    13       30
chr1    100735342       44   111       88        65    40       66
chr1    100805662        0    0         1         1    0         0
chr1    100839460        1    0         5         0    0         0

The formula I'm using is as below. It runs without error but df2 is exactly the same as df1

    df2 <- df1[,(colSums(df1[,3:ncol(df1)]) > 100000),]

Upvotes: 1

Views: 824

Answers (1)

David Arenburg
David Arenburg

Reputation: 92282

When running a Boolean expression on a k columns subset you are receiving a k size logical vector. When entered inside a n size column data set, n-k values from the beginning of the vector are being recycled (until it reaches the size of n), thus wrong columns are being selected. In your case, the fix is simple, just add n-k TRUE values at the beginning of the logical vector (because you want to keep all the n-k columns at the beginning)

df1[c(rep(TRUE, 2L), colSums(df1[3L:ncol(df1)]) > 150L)]
#    chr   leftPos FLD0197
# 1 chr1 100260254      52
# 2 chr1 100735342     111
# 3 chr1 100805662       0
# 4 chr1 100839460       0

Upvotes: 4

Related Questions