Reputation: 33
I have an existing matrix and I want to replace some of the existing values by NA's in a random uniform way.
I tried to use the following, but it only replaced 392 values with NA
, not 452 as I expected. What am I doing wrong?
N <- 452
ind1 <- (runif(N,2,length(macro_complet$Sod)))
macro_complet$Sod[ind1] <- NA
summary(macro_complet$Sod)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.3222 0.9138 1.0790 1.1360 1.3010 2.8610 392.0000
My data looks like this
> str(macro_complet)
'data.frame': 1504 obs. of 26 variables:
$ Sod : num 8.6 13.1 12 13.8 12.9 10 7 14.8 11.3 4.9 ...
$ Azo : num 2 1.7 2.2 1.9 1.89 1.61 1.72 2.1 1.63 2 ...
$ Cal : num 26 28.1 24 28.5 24.5 24 17.4 26.6 24.8 10.5 ...
$ Bic : num 72 82 81 84 77 68 66 81 70 37.8 ...
$ DBO : num 3 2.2 3 2.7 3.3 3 3.2 2.9 2.8 2 ...
$ AzoK : num 0.7 0.7 0.9 0.8 0.7 0.7 0.7 0.9 0.7 0.7 ...
$ Orho : num 0.3 0.2 0.31 0.19 0.19 0.2 0.16 0.24 0.2 0.01 ...
$ Ammo : num 0.12 0.16 0.15 0.13 0.19 0.22 0.19 0.16 0.17 0.08 ...
$ Carb : num 0.3 0.3 2 0.3 0.3 0.3 0.3 0.3 0.3 0.5 ...
$ Ox : num 10.2 9.7 9.8 9.6 9.7 9.1 9.1 8.1 9.7 10.6 ...
$ Mag : num 5.5 6.5 6.3 7 6.4 5.1 6 6.7 5.7 2 ...
$ Nit : num 4.2 4.7 5.7 4.6 4.2 3.5 4.9 4.5 4.2 2.8 ...
$ Matsu : num 17 9 24 15 17 19 20 19 13 3.9 ...
$ Tp : num 10.5 9.7 11.9 12 12.9 11.2 12.8 13.7 11.5 10.6 ...
$ Co : num 3 3.45 3.3 3.54 2.7 2.7 3.3 3.49 2.8 1.8 ...
$ Ch : num 17 24 22 28 25 19 13 28 23 6.4 ...
$ Cu : num 25 15 20 20 15 20 15 15 20 15 ...
$ Po : num 3.5 3.8 4 3.6 3.8 3.7 3 4.2 3.7 0.4 ...
$ Ph : num 0.2 0.17 0.2 0.14 0.18 0.2 0.17 0.17 0.17 0.01 ...
$ Cnd : int 226 275 285 295 272 225 267 283 251 61 ...
$ Txs : num 93 88 89 86 87 88 84 80 91 94 ...
$ Niti : num 0.06 0.09 0.07 0.06 0.08 0.07 0.08 0.11 0.1 0.01 ...
$ Dt : num 9 9.7 9 10.2 8 8 7 9.4 8.5 3 ...
$ H : num 7.6 7.7 7.6 7.7 7.55 7.4 7.3 7.5 7.5 7.6 ...
$ Dco : int 17 12 15 13 15 20 16 14 12 7 ...
$ Sf : num 22 20.5 18 22.2 22.1 21 11.6 21.7 21.9 6.8 ...
I also tried to do this for only a single variable, but got the same result.
I converted my data frame into a matrix using
as.matrix(n1)
then I replaced some values for only one variable
N <- 300
ind <- (runif(N,1,length(n1$Sodium)))
n1$Sodium[ind] <- NA
However, using summary()
I observed that only 262 values were replaced instead of 300 as expected. What am I doing wrong?
summary(n1$Sodium)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.3222 0.8976 1.0790 1.1320 1.3010 2.8610 262.0000
Upvotes: 3
Views: 5597
Reputation: 886938
We could use
vec[sample(seq_along(vec), 4, replace = FALSE)] <- NA
Upvotes: 1
Reputation: 70623
Try this. This will sample your matrix uniformly without replacement (so the same value is not chosen and replaced twice). If you want some other distribution, you can modify the weights using the prob
argument (see ?sample
)
vec <- matrix(1:25, nrow = 5)
vec[sample(1:length(vec), 4, replace = FALSE)] <- NA
vec
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 NA 16 NA
[2,] NA 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
Upvotes: 7
Reputation: 55340
you must apply runif
in the right spot, which is the index to vec. (The way you have it now, you are asking R
to draw random numbers from a uniform distribution between NA
and NA
, which of course does not make sense and so it gives you back NaN
s)
Try instead:
N <- 5 # the number of random values to replace
inds <- round ( runif(N, 1, length(vec)) ) # draw random values from [1, length(vec)]
vec[inds] <- NA # use the random values as indicies to vec, for which to replace
Note that it is not necessary to use round(.)
since [[
will accept numerics, but they will all be rounded down by default, which is just slightly less than a uniform dist.
Upvotes: 3