Revan
Revan

Reputation: 2322

R: Sort matrix based on amount of row values

I have a matrix with non numeric-values (missing values are blank, not Nan).

mat = read.table(textConnection(
"   s1  s2  s3
g1  a;b  a  b
g2       b   
g3  a       a;b"), row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
mat = as.matrix(mat)

What I want to do is to subset the matrix to select the rows with the two highest values in a row.

So the result should be

g1  a;b  a  b # with three values
g3  a       a;b # with two values
# g2 should be excluded because it only has one value

My approach would be

But I do not understand how to sort a matrix by the amount of entries.

Any ideas?

Upvotes: 1

Views: 353

Answers (1)

akuiper
akuiper

Reputation: 214927

You can try something with the apply by the row and check how many elements in the row is an empty string, then sort by the count. So the sorted matrix would be like:

mat[order(apply(mat, 1, function(row) sum(row != "")), decreasing = T), ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"
g2 ""    ""  "b"  

Say if the threshold is 2, you can also specify it in the function directly without sorting:

mat[apply(mat, 1, function(row) sum(row != "") >= 2), ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"

Another way as suggested by @alexis_laz is using rowSums:

mat[rowSums(mat != "") >= 2, ]
   s1    s2  s3   
g1 "a;b" "a" "b"  
g3 "a"   ""  "a;b"

Upvotes: 3

Related Questions