Reputation: 2322
I have a matrix with non numeric-values (missing values are blank, not Nan).
mat = read.table(textConnection(
" s1 s2 s3
g1 a;b a b
g2 b
g3 a a;b"), row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
mat = as.matrix(mat)
What I want to do is to subset the matrix to select the rows with the two highest values in a row.
So the result should be
g1 a;b a b # with three values
g3 a a;b # with two values
# g2 should be excluded because it only has one value
My approach would be
But I do not understand how to sort a matrix by the amount of entries.
Any ideas?
Upvotes: 1
Views: 353
Reputation: 214927
You can try something with the apply
by the row and check how many elements in the row is an empty string, then sort by the count. So the sorted matrix would be like:
mat[order(apply(mat, 1, function(row) sum(row != "")), decreasing = T), ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"
g2 "" "" "b"
Say if the threshold is 2, you can also specify it in the function directly without sorting:
mat[apply(mat, 1, function(row) sum(row != "") >= 2), ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"
Another way as suggested by @alexis_laz is using rowSums
:
mat[rowSums(mat != "") >= 2, ]
s1 s2 s3
g1 "a;b" "a" "b"
g3 "a" "" "a;b"
Upvotes: 3