Reputation: 21
I have a big data frame (100 x 1748), below is reduced version 9x10:
Fraction Treatment Time Replicate A B C D E F
LL10.5AT T LL 10.5 A 11.11428 11.82154 10.445625 8.849699 10.373386 9.109676
LL10.5BT T LL 10.5 B 12.17890 11.01224 11.720548 9.405390 10.206708 10.653205
LL10.5CT T LL 10.5 C 10.80697 11.19782 11.175291 8.305949 9.696153 8.791403
OL10.5AT T OL 10.5 A 10.46481 10.81123 9.975277 7.783538 9.784773 8.640531
OL10.5BT T OL 10.5 B 10.75621 10.76371 10.625745 7.592059 9.820686 8.760861
OL10.5CT T OL 10.5 C 12.00054 11.02080 11.615536 8.903105 9.963635 10.547791
HL10.5AT T HL 10.5 A 10.87092 11.45102 10.780183 6.422136 10.424391 9.489396
HL10.5BT T HL 10.5 B 12.12334 11.29960 11.541679 9.774041 9.563639 10.532936
HL10.5CT T HL 10.5 C 10.21460 10.64746 9.886603 7.834040 9.828347 8.261546
I want to to subset it so it only contains columns for which sum > 100. I use the following code
dt.sub <- dt[,colSums(dt[,5:ncol(dt)]) > 100]
but I still get column with sums < 100. I created a vector to check:
z<-colSums(dt.sub[,5:ncol(dt.sub)])
and this is its tail:
> tail(z)
501 502 503 504 505
107.9368630 90.6337275 0.8724593 0.8724593 1.3497445 1.3497445
Thank you for help, Kasia
Upvotes: 2
Views: 34
Reputation: 50678
You're indexing columns incorrectly.
df[, c(1:4, which(colSums(df[, 5:ncol(df)]) > 100) + 4)]
# Fraction Treatment Time Replicate A B
#LL10.5AT TRUE LL 10.5 A 11.11428 11.82154
#LL10.5BT TRUE LL 10.5 B 12.17890 11.01224
#LL10.5CT TRUE LL 10.5 C 10.80697 11.19782
#OL10.5AT TRUE OL 10.5 A 10.46481 10.81123
#OL10.5BT TRUE OL 10.5 B 10.75621 10.76371
#OL10.5CT TRUE OL 10.5 C 12.00054 11.02080
#HL10.5AT TRUE HL 10.5 A 10.87092 11.45102
#HL10.5BT TRUE HL 10.5 B 12.12334 11.29960
#HL10.5CT TRUE HL 10.5 C 10.21460 10.64746
Explanation: which(colSums(df[, 5:ncol(df)]) > 100)
returns the indices within df[, 5:ncol(df)]
(not within df
!), where the column sum is >100
; we then add 4 (since we started at index 5), and include columns 1 to 4 to get the indices of columns in df
that we want to retain.
#df <- read.table(text =
# " Fraction Treatment Time Replicate A B C D E F
#LL10.5AT T LL 10.5 A 11.11428 11.82154 10.445625 8.849699 10.373386 9.109676
#LL10.5BT T LL 10.5 B 12.17890 11.01224 11.720548 9.405390 10.206708 10.653205
#LL10.5CT T LL 10.5 C 10.80697 11.19782 11.175291 8.305949 9.696153 8.791403
#OL10.5AT T OL 10.5 A 10.46481 10.81123 9.975277 7.783538 9.784773 8.640531
#OL10.5BT T OL 10.5 B 10.75621 10.76371 10.625745 7.592059 9.820686 8.760861
#OL10.5CT T OL 10.5 C 12.00054 11.02080 11.615536 8.903105 9.963635 10.547791
#HL10.5AT T HL 10.5 A 10.87092 11.45102 10.780183 6.422136 10.424391 9.489396
#HL10.5BT T HL 10.5 B 12.12334 11.29960 11.541679 9.774041 9.563639 10.532936
#HL10.5CT T HL 10.5 C 10.21460 10.64746 9.886603 7.834040 9.828347 8.261546", header = T)
Upvotes: 1