Reputation: 29
I have a matrix like so:
a b c d
[1] as ac ad ae
[2] bd bf bg bh
[3] NA cf cd ce
[4] NA NA dr dy
[5] NA NA NA ej
I would like to subset every column separately into a matrix or list based on 50% of the observations, so I would like my output to look like this:
a b c d
[1] as ac ad ae
[2] NA bf bg bh
[3] NA NA NA ce
So far I have used to code for separate columns without NA's.
mv.s <- subset(mv, mv <= quantile(mv, 0.5))
now I was thinking of using something like
for (i in 1:15) {
mv.s[[i]] <- subset(mv[[i]], mv <= quantile(mv, 0.5))
}
However, when I do this I get the warning:
Error in quantile.default(mv, 0.5) : missing values and NaN's not allowed if 'na.rm' is FALSE
when I try this code:
for (i in 1:15) {
mv.s[[i]] <- subset(mv[[i]], mv <= quantile(mv[[i]], 0.5))
}
I get
Error in (1 - h) * qs[i] : non-numeric argument to binary operator
Any help would be appreciated.
Upvotes: 0
Views: 2163
Reputation: 1204
Without using any package and just the apply function you could do the following.
apply(mat, 2, FUN = function(x){ sample(x, ceiling(length(x)/2), replace = FALSE)})
That takes a random sample of your observations per column without replacement and assumes that your matrix is called mat
.
If you use set.seed(1)
to make the random sample reproducible the result will look like this.
[,1] [,2] [,3] [,4]
[1,] "bd" NA NA "ae"
[2,] NA "ac" "cd" "ej"
[3,] NA "cf" "bg" "dy"
Upvotes: 2
Reputation: 2353
The sample_frac()
function in dplyr
sounds like it fits your needs.
install.packages('dplyr')
library(dplyr)
subset_matrix <- apply(mv, 2, function(x) sample_frac(x, .5, replace = F))
You can specify which fraction of rows you want sampled in sample_frac()
. Using apply()
column-wise will give you that fraction of observations for each column.
I did not test this because you didn't provide a sample of your data, but it looks like it should work.
Upvotes: 1