nik
nik

Reputation: 2584

sorting based on all columns element and select with a threshold

I have a data looks like this

df<- structure(list(V1 = structure(c(6L, 2L, 3L, 7L, 5L, 4L, 8L, 1L
), .Label = c("A0A0G2JDV6", "P01901", "P13745", "Q03141", "Q3TMK4", 
"Q3UCW4", "Q8CBE6", "Q8VCQ8"), class = "factor"), V2 = c(1.234548336, 
0.982968881, 1.521367521, 1.00623053, 0.868106341, 1.035714286, 
0, 2.436170213), V3 = c(1.185419968, 1.131202691, 1.558404558, 
0.775700935, 0.74580573, 0.897230321, 0, 2.686170213), V4 = c(1.0681458, 
1.08999159, 1.715099715, 0.943925234, 0.774627893, 0.927842566, 
0, 2.287234043), V5 = c(1.535657686, 1.25862069, 2.068376068, 
1.012461059, 0.828314549, 1.04664723, 0, 2.579787234), V6 = c(1.605388273, 
1.280277544, 1.792022792, 0.875389408, 0.828357567, 1.183673469, 
0, 2.558510638)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"
), class = "data.frame", row.names = c(NA, -8L))

I import the data as

df <- read.delim(" path to the data/df.txt", encoding="ASCII", header=FALSE)

what I want to do is to sort them based on all columns for example the one which has all 4 values higher than the rest rows come the first this repeated to the end

So the output will look like below called df2

A0A0G2  2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745  1.521367521 1.558404558 1.715099715 2.068376068 1.792022792
Q3UCW4  1.234548336 1.185419968 1.0681458   1.535657686 1.605388273
P01901  0.982968881 1.131202691 1.08999159  1.25862069  1.280277544
Q03141  1.035714286 0.897230321 0.927842566 1.04664723  1.183673469
Q8CBE6  1.00623053  0.775700935 0.943925234 1.012461059 0.875389408
Q3TMK4  0.868106341 0.74580573  0.774627893 0.828314549 0.828357567
Q8VCQ8  0              0           0            0         0

And from this df2, I want to select those that all values are higher than a value (for example 1.1) so the df3 will be

A0A0G2  2.436170213 2.686170213 2.287234043 2.579787234 2.558510638
P13745  1.521367521 1.558404558 1.715099715 2.068376068 1.792022792

Upvotes: 1

Views: 113

Answers (2)

vasanthcullen
vasanthcullen

Reputation: 386

Edit : The sorting is done wrong in this answer. Please refer to Procrastinatus Maximus's answer where it is correct. Thanks for pointing it out Procrastinatus Maximus :)

Sort by all the five columns in descending order

df2 <- df[order(df$V2, df$V3, df$V4, df$V5, df$V6, decreasing = TRUE), ]

Keep only the rows where all values are greater than 1.1

df3 <- df2[apply(X = df2[paste0("V", 2:6)], MARGIN = 1, FUN = function(x) all(x > 1.1)), ]

Let me know if it helps

Upvotes: 0

Jaap
Jaap

Reputation: 83275

To get the desired output, you can use a combination of order and rowSums. Using:

df2 <- df[order(rowSums(df[,-1]), decreasing = TRUE),]

gives:

> df2
          V1        V2        V3        V4        V5        V6
8 A0A0G2JDV6 2.4361702 2.6861702 2.2872340 2.5797872 2.5585106
3     P13745 1.5213675 1.5584046 1.7150997 2.0683761 1.7920228
1     Q3UCW4 1.2345483 1.1854200 1.0681458 1.5356577 1.6053883
2     P01901 0.9829689 1.1312027 1.0899916 1.2586207 1.2802775
6     Q03141 1.0357143 0.8972303 0.9278426 1.0466472 1.1836735
4     Q8CBE6 1.0062305 0.7757009 0.9439252 1.0124611 0.8753894
5     Q3TMK4 0.8681063 0.7458057 0.7746279 0.8283145 0.8283576
7     Q8VCQ8 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000

To get only the rows in which all the values are higher than 1.1, you can use:

df2[rowSums(df2[,-1] > 1.1) == 5, ]

which gives:

          V1       V2       V3       V4       V5       V6
8 A0A0G2JDV6 2.436170 2.686170 2.287234 2.579787 2.558511
3     P13745 1.521368 1.558405 1.715100 2.068376 1.792023

Upvotes: 3

Related Questions