Eva Serrano
Eva Serrano

Reputation: 33

replacing randomly values in an existing matrix in R

I have an existing matrix and I want to replace some of the existing values by NA's in a random uniform way.

I tried to use the following, but it only replaced 392 values with NA, not 452 as I expected. What am I doing wrong?

N <- 452

ind1 <- (runif(N,2,length(macro_complet$Sod)))

macro_complet$Sod[ind1] <- NA

summary(macro_complet$Sod)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.3222   0.9138   1.0790   1.1360   1.3010   2.8610 392.0000 

My data looks like this

> str(macro_complet)
'data.frame':   1504 obs. of  26 variables:
 $ Sod                     : num  8.6 13.1 12 13.8 12.9 10 7 14.8 11.3 4.9 ...
 $ Azo                     : num  2 1.7 2.2 1.9 1.89 1.61 1.72 2.1 1.63 2 ...
 $ Cal                     : num  26 28.1 24 28.5 24.5 24 17.4 26.6 24.8 10.5 ...
 $ Bic                     : num  72 82 81 84 77 68 66 81 70 37.8 ...
 $ DBO                     : num  3 2.2 3 2.7 3.3 3 3.2 2.9 2.8 2 ...
 $ AzoK                    : num  0.7 0.7 0.9 0.8 0.7 0.7 0.7 0.9 0.7 0.7 ...
 $ Orho                    : num  0.3 0.2 0.31 0.19 0.19 0.2 0.16 0.24 0.2 0.01 ...
 $ Ammo                    : num  0.12 0.16 0.15 0.13 0.19 0.22 0.19 0.16 0.17 0.08 ...
 $ Carb                    : num  0.3 0.3 2 0.3 0.3 0.3 0.3 0.3 0.3 0.5 ...
 $ Ox                      : num  10.2 9.7 9.8 9.6 9.7 9.1 9.1 8.1 9.7 10.6 ...
 $ Mag                     : num  5.5 6.5 6.3 7 6.4 5.1 6 6.7 5.7 2 ...
 $ Nit                     : num  4.2 4.7 5.7 4.6 4.2 3.5 4.9 4.5 4.2 2.8 ...
 $ Matsu                   : num  17 9 24 15 17 19 20 19 13 3.9 ...
 $ Tp                      : num  10.5 9.7 11.9 12 12.9 11.2 12.8 13.7 11.5 10.6 ...
 $ Co                      : num  3 3.45 3.3 3.54 2.7 2.7 3.3 3.49 2.8 1.8 ...
 $ Ch                      : num  17 24 22 28 25 19 13 28 23 6.4 ...
 $ Cu                      : num  25 15 20 20 15 20 15 15 20 15 ...
 $ Po                      : num  3.5 3.8 4 3.6 3.8 3.7 3 4.2 3.7 0.4 ...
 $ Ph                      : num  0.2 0.17 0.2 0.14 0.18 0.2 0.17 0.17 0.17 0.01 ...
 $ Cnd                     : int  226 275 285 295 272 225 267 283 251 61 ...
 $ Txs                     : num  93 88 89 86 87 88 84 80 91 94 ...
 $ Niti                    : num  0.06 0.09 0.07 0.06 0.08 0.07 0.08 0.11 0.1 0.01 ...
 $ Dt                      : num  9 9.7 9 10.2 8 8 7 9.4 8.5 3 ...
 $ H                       : num  7.6 7.7 7.6 7.7 7.55 7.4 7.3 7.5 7.5 7.6 ...
 $ Dco                     : int  17 12 15 13 15 20 16 14 12 7 ...
 $ Sf                      : num  22 20.5 18 22.2 22.1 21 11.6 21.7 21.9 6.8 ...

I also tried to do this for only a single variable, but got the same result.

I converted my data frame into a matrix using

as.matrix(n1)

then I replaced some values for only one variable

N <- 300

ind <- (runif(N,1,length(n1$Sodium)))

n1$Sodium[ind] <- NA

However, using summary() I observed that only 262 values were replaced instead of 300 as expected. What am I doing wrong?

summary(n1$Sodium)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.3222   0.8976   1.0790   1.1320   1.3010   2.8610 262.0000

Upvotes: 3

Views: 5597

Answers (3)

akrun
akrun

Reputation: 886938

We could use

vec[sample(seq_along(vec), 4, replace = FALSE)] <- NA

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70623

Try this. This will sample your matrix uniformly without replacement (so the same value is not chosen and replaced twice). If you want some other distribution, you can modify the weights using the prob argument (see ?sample)

vec <- matrix(1:25, nrow = 5)
vec[sample(1:length(vec), 4, replace = FALSE)] <- NA

vec
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA    6   NA   16   NA
[2,]   NA    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

Upvotes: 7

Ricardo Saporta
Ricardo Saporta

Reputation: 55340

you must apply runif in the right spot, which is the index to vec. (The way you have it now, you are asking R to draw random numbers from a uniform distribution between NA and NA, which of course does not make sense and so it gives you back NaNs)

Try instead:

        N  <-  5                                   # the number of random values to replace
      inds <- round ( runif(N, 1, length(vec)) )   # draw random values from [1, length(vec)]
 vec[inds] <- NA                                   # use the random values as indicies to vec, for which to replace

Note that it is not necessary to use round(.) since [[ will accept numerics, but they will all be rounded down by default, which is just slightly less than a uniform dist.

Upvotes: 3

Related Questions