sorting based on all columns element and select with a threshold

Question

I have a data looks like this

df<- structure(list(V1 = structure(c(6L, 2L, 3L, 7L, 5L, 4L, 8L, 1L
), .Label = c("A0A0G2JDV6", "P01901", "P13745", "Q03141", "Q3TMK4", 
"Q3UCW4", "Q8CBE6", "Q8VCQ8"), class = "factor"), V2 = c(1.234548336, 
0.982968881, 1.521367521, 1.00623053, 0.868106341, 1.035714286, 
0, 2.436170213), V3 = c(1.185419968, 1.131202691, 1.558404558, 
0.775700935, 0.74580573, 0.897230321, 0, 2.686170213), V4 = c(1.0681458, 
1.08999159, 1.715099715, 0.943925234, 0.774627893, 0.927842566, 
0, 2.287234043), V5 = c(1.535657686, 1.25862069, 2.068376068, 
1.012461059, 0.828314549, 1.04664723, 0, 2.579787234), V6 = c(1.605388273, 
1.280277544, 1.792022792, 0.875389408, 0.828357567, 1.183673469, 
0, 2.558510638)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"
), class = "data.frame", row.names = c(NA, -8L))

I import the data as

df <- read.delim(" path to the data/df.txt", encoding="ASCII", header=FALSE)

what I want to do is to sort them based on all columns for example the one which has all 4 values higher than the rest rows come the first this repeated to the end

So the output will look like below called df2

A0A0G2  2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745  1.521367521 1.558404558 1.715099715 2.068376068 1.792022792
Q3UCW4  1.234548336 1.185419968 1.0681458   1.535657686 1.605388273
P01901  0.982968881 1.131202691 1.08999159  1.25862069  1.280277544
Q03141  1.035714286 0.897230321 0.927842566 1.04664723  1.183673469
Q8CBE6  1.00623053  0.775700935 0.943925234 1.012461059 0.875389408
Q3TMK4  0.868106341 0.74580573  0.774627893 0.828314549 0.828357567
Q8VCQ8  0              0           0            0         0

And from this df2, I want to select those that all values are higher than a value (for example 1.1) so the df3 will be

A0A0G2  2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745  1.521367521 1.558404558 1.715099715 2.068376068 1.792022792

Jaap · Accepted Answer

To get the desired output, you can use a combination of order and rowSums. Using:

df2 <- df[order(rowSums(df[,-1]), decreasing = TRUE),]

gives:

> df2
          V1        V2        V3        V4        V5        V6
8 A0A0G2JDV6 2.4361702 2.6861702 2.2872340 2.5797872 2.5585106
3     P13745 1.5213675 1.5584046 1.7150997 2.0683761 1.7920228
1     Q3UCW4 1.2345483 1.1854200 1.0681458 1.5356577 1.6053883
2     P01901 0.9829689 1.1312027 1.0899916 1.2586207 1.2802775
6     Q03141 1.0357143 0.8972303 0.9278426 1.0466472 1.1836735
4     Q8CBE6 1.0062305 0.7757009 0.9439252 1.0124611 0.8753894
5     Q3TMK4 0.8681063 0.7458057 0.7746279 0.8283145 0.8283576
7     Q8VCQ8 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000

To get only the rows in which all the values are higher than 1.1, you can use:

df2[rowSums(df2[,-1] > 1.1) == 5, ]

which gives:

          V1       V2       V3       V4       V5       V6
8 A0A0G2JDV6 2.436170 2.686170 2.287234 2.579787 2.558511
3     P13745 1.521368 1.558405 1.715100 2.068376 1.792023

sorting based on all columns element and select with a threshold

Answers (2)

Related Questions