John
John

Reputation: 1947

Check condition of data frame with corresponding vector

Let's create some artificial data and their 0.99 quantiles.

set.seed(42)
x = data.frame("Norm" = rnorm(100),
               "Unif" = runif(100),
               "Exp" = rexp(100))

quants <- apply(x, 2, quantile, 0.99)

I want to check without loop which elements of the variables are bigger than 0.99 quantile.

So first variable should be compared with first element of quants, second with second and third with third.

Intuitively I used: x > quants and it's good that I checked the outcome, because R seems to interpret this command as something else.

e.g. 

> head(x > quants)
      Norm  Unif   Exp
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,] FALSE FALSE  TRUE

As you can see third element of Exp should signalize that it's bigger than 0.99 quantile. However:

> x[3, ][3] > quants[3] 
    Exp
3 FALSE 

Gives false. Do you know how can I fix this problem ? I tried to play with apply but wasn't sure how to use it properly in this case.

Upvotes: 1

Views: 55

Answers (4)

AnilGoyal
AnilGoyal

Reputation: 26238

Actually when checking x > quants R checks it columnwise instead of rowwise. First element of first row is checked with first quants, first element of second row is checked with second quants and so on. Hence when checking x[3,3], it is actually 203rd element in this iteration and is thus checked with second element of quants (203 %% 3 = 2). That's you're getting an error.

Also see

colSums(x > quants)

Norm Unif  Exp 
   4    0   19

which locates the error in given syntax.

Upvotes: 2

Jakub.Novotny
Jakub.Novotny

Reputation: 3067

You could use purrr::map2_df.

# there are two objects I am iterating
# x data.frame is referenced as .x
# quants vector is referenced as .y
purrr::map2_df(x, quants, ~ .x > .y)

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21938

I think the following code might help you get your desired output:

library(purrr)

set.seed(42)
x = data.frame("Norm" = rnorm(100),
               "Unif" = runif(100),
               "Exp" = rexp(100))

quants <- apply(x, 2, quantile, 0.99)

map2_dfr(x, quants, ~ .x > .y)

# A tibble: 100 x 3
   Norm  Unif  Exp  
   <lgl> <lgl> <lgl>
 1 FALSE FALSE FALSE
 2 FALSE FALSE FALSE
 3 FALSE FALSE FALSE
 4 FALSE FALSE FALSE
 5 FALSE FALSE FALSE
 6 FALSE FALSE FALSE
 7 FALSE FALSE FALSE
 8 FALSE FALSE FALSE
 9 FALSE FALSE FALSE
10 FALSE FALSE FALSE
# ... with 90 more rows

And here is another easy way if you want to stick to base R:

head(mapply(function(x, y) x > y, x, quants)) 

      Norm  Unif   Exp
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
[6,] FALSE FALSE FALSE

Upvotes: 2

PKumar
PKumar

Reputation: 11128

How about this, Here x is your dataframe, quants the value from which you want comparision and function applied is greater than symbol. Sweep applied here on column wise hence 2:

sweep(x, 2,STATS=quants, `>`)

Upvotes: 3

Related Questions