Reputation: 2584
I have a data looks like this
df<- structure(list(V1 = structure(c(6L, 2L, 3L, 7L, 5L, 4L, 8L, 1L
), .Label = c("A0A0G2JDV6", "P01901", "P13745", "Q03141", "Q3TMK4",
"Q3UCW4", "Q8CBE6", "Q8VCQ8"), class = "factor"), V2 = c(1.234548336,
0.982968881, 1.521367521, 1.00623053, 0.868106341, 1.035714286,
0, 2.436170213), V3 = c(1.185419968, 1.131202691, 1.558404558,
0.775700935, 0.74580573, 0.897230321, 0, 2.686170213), V4 = c(1.0681458,
1.08999159, 1.715099715, 0.943925234, 0.774627893, 0.927842566,
0, 2.287234043), V5 = c(1.535657686, 1.25862069, 2.068376068,
1.012461059, 0.828314549, 1.04664723, 0, 2.579787234), V6 = c(1.605388273,
1.280277544, 1.792022792, 0.875389408, 0.828357567, 1.183673469,
0, 2.558510638)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"
), class = "data.frame", row.names = c(NA, -8L))
I import the data as
df <- read.delim(" path to the data/df.txt", encoding="ASCII", header=FALSE)
what I want to do is to sort them based on all columns for example the one which has all 4 values higher than the rest rows come the first this repeated to the end
So the output will look like below called df2
A0A0G2 2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745 1.521367521 1.558404558 1.715099715 2.068376068 1.792022792
Q3UCW4 1.234548336 1.185419968 1.0681458 1.535657686 1.605388273
P01901 0.982968881 1.131202691 1.08999159 1.25862069 1.280277544
Q03141 1.035714286 0.897230321 0.927842566 1.04664723 1.183673469
Q8CBE6 1.00623053 0.775700935 0.943925234 1.012461059 0.875389408
Q3TMK4 0.868106341 0.74580573 0.774627893 0.828314549 0.828357567
Q8VCQ8 0 0 0 0 0
And from this df2, I want to select those that all values are higher than a value (for example 1.1) so the df3 will be
A0A0G2 2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745 1.521367521 1.558404558 1.715099715 2.068376068 1.792022792
Upvotes: 1
Views: 113
Reputation: 386
Edit : The sorting is done wrong in this answer. Please refer to Procrastinatus Maximus's answer where it is correct. Thanks for pointing it out Procrastinatus Maximus :)
Sort by all the five columns in descending order
df2 <- df[order(df$V2, df$V3, df$V4, df$V5, df$V6, decreasing = TRUE), ]
Keep only the rows where all values are greater than 1.1
df3 <- df2[apply(X = df2[paste0("V", 2:6)], MARGIN = 1, FUN = function(x) all(x > 1.1)), ]
Let me know if it helps
Upvotes: 0
Reputation: 83275
To get the desired output, you can use a combination of order
and rowSums
. Using:
df2 <- df[order(rowSums(df[,-1]), decreasing = TRUE),]
gives:
> df2
V1 V2 V3 V4 V5 V6
8 A0A0G2JDV6 2.4361702 2.6861702 2.2872340 2.5797872 2.5585106
3 P13745 1.5213675 1.5584046 1.7150997 2.0683761 1.7920228
1 Q3UCW4 1.2345483 1.1854200 1.0681458 1.5356577 1.6053883
2 P01901 0.9829689 1.1312027 1.0899916 1.2586207 1.2802775
6 Q03141 1.0357143 0.8972303 0.9278426 1.0466472 1.1836735
4 Q8CBE6 1.0062305 0.7757009 0.9439252 1.0124611 0.8753894
5 Q3TMK4 0.8681063 0.7458057 0.7746279 0.8283145 0.8283576
7 Q8VCQ8 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
To get only the rows in which all the values are higher than 1.1
, you can use:
df2[rowSums(df2[,-1] > 1.1) == 5, ]
which gives:
V1 V2 V3 V4 V5 V6
8 A0A0G2JDV6 2.436170 2.686170 2.287234 2.579787 2.558511
3 P13745 1.521368 1.558405 1.715100 2.068376 1.792023
Upvotes: 3