Fabian Stolz
Fabian Stolz

Reputation: 2085

Replace negative values by zero

We want to set all values in an array zero that are negative.

I tried out a a lot of stuff but did not yet achieve a working solution. I thought about a for loop with condition, however this seems not to work.

#pred_precipitation is our array
pred_precipitation <-rnorm(25,2,4)     

for (i in nrow(pred_precipitation))
{
  if (pred_precipitation[i]<0) {pred_precipitation[i] = 0}
  else{pred_precipitation[i] = pred_precipitation[i]}
}

Upvotes: 34

Views: 101187

Answers (5)

Emily
Emily

Reputation: 71

To answer @ah bon, if we have multiple columns, ie., both varA and varB need to replace their negative values with 0, we can use mutate(across()) to avoid repeating the ifelse statement.

If varA and varB are adjacent:

df %>%
  mutate(across(varA:varB, ~ ifelse(.x < 0, 0, .x)))

If they are not:

df %>%
  mutate(across(c(varA, varB), ~ ifelse(.x < 0, 0, .x)))

Upvotes: 2

Simon Stolz
Simon Stolz

Reputation: 237

If your main object is a tibble or dataframe you can also use the tidy package. In comparison to the replacement proposed by Ari B. Friedman, the replacement could be written "on the fly" and combined with other mutations.

An example using dplyr and the %>% pipes would look like this:

df %>% mutate(varA = if_else(varA < 0, 0, varA))

You can add further mutations (i.e., new variables) within the mutate() statement. An advantage that I see in this type of coding is that you do not run the risk of skipping or re-executing an individual transformation step, since they are all grouped in one statement. For example, by adding %>% View() in RStudio you can already preview the result. However, the result is not yet stored anywhere ("on the fly"). This way you keep your namespace / environment clean when changing the code.

Upvotes: 10

Ari B. Friedman
Ari B. Friedman

Reputation: 72731

Thanks for the reproducible example. This is pretty basic R stuff. You can assign to selected elements of a vector (note an array has dimensions, and what you've given is a vector not an array):

> pred_precipitation[pred_precipitation<0] <- 0
> pred_precipitation
 [1] 1.2091281 0.0000000 7.7665555 0.0000000 0.0000000 0.0000000 0.5151504 0.0000000 1.8281251
[10] 0.5098688 2.8370263 0.4895606 1.5152191 4.1740177 7.1527742 2.8992215 4.5322934 6.7180530
[19] 0.0000000 1.1914052 3.6152333 0.0000000 0.3778717 0.0000000 1.4940469

Benchmark wars!

@James has found an even faster method and left it in a comment. I upvoted him, if only because I know his victory will be short-lived.

First, I try compiling, but that doesn't seem to help anyone:

p <- rnorm(10000)
gsk3 <- function(x) { x[x<0] <- 0; x }
jmsigner <- function(x) ifelse(x<0, 0, x)
joshua <- function(x) pmin(x,0)
james <- function(x) (abs(x)+x)/2
library(compiler)
gsk3.c <- cmpfun(gsk3)
jmsigner.c <- cmpfun(jmsigner)
joshua.c <- cmpfun(joshua)
james.c <- cmpfun(james)

microbenchmark(joshua(p),joshua.c(p),gsk3(p),gsk3.c(p),jmsigner(p),james(p),jmsigner.c(p),james.c(p))
           expr      min        lq    median        uq      max
1     gsk3.c(p)  251.782  255.0515  266.8685  269.5205  457.998
2       gsk3(p)  256.262  261.6105  270.7340  281.3560 2940.486
3    james.c(p)   38.418   41.3770   43.3020   45.6160  132.342
4      james(p)   38.934   42.1965   43.5700   47.2085 4524.303
5 jmsigner.c(p) 2047.739 2145.9915 2198.6170 2291.8475 4879.418
6   jmsigner(p) 2047.502 2169.9555 2258.6225 2405.0730 5064.334
7   joshua.c(p)  237.008  244.3570  251.7375  265.2545  376.684
8     joshua(p)  237.545  244.8635  255.1690  271.9910  430.566

compiled comparison

But wait! Dirk wrote this Rcpp thing. Can a complete C++ incompetent read his JSS paper, adapt his example, and write the fastest function of them all? Stay tuned, dear listeners.

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
microbenchmark(joshua(p),joshua.c(p),gsk3(p),gsk3.c(p),jmsigner(p),james(p),jmsigner.c(p),james.c(p), cpp_if(p))
         expr      min        lq    median        uq       max
1   cpp_if(p)    8.233   10.4865   11.6000   12.4090    69.512
2     gsk3(p)  170.572  172.7975  175.0515  182.4035  2515.870
3    james(p)   37.074   39.6955   40.5720   42.1965  2396.758
4 jmsigner(p) 1110.313 1118.9445 1133.4725 1164.2305 65942.680
5   joshua(p)  237.135  240.1655  243.3990  250.3660  2597.429

with rcpp comparison

That's affirmative, captain.

This modifies the input p even if you don't assign to it. If you want to avoid that behavior, you have to clone:

cpp_ifclone_src <- '
  Rcpp::NumericVector xa(Rcpp::clone(a));
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_ifclone <- cxxfunction(signature(a="numeric"), cpp_ifclone_src, plugin="Rcpp")

Which unfortunately kills the speed advantage.

Upvotes: 72

Joshua Ulrich
Joshua Ulrich

Reputation: 176638

I would use pmax because ifelse can be a bit slow at times and subset-replacement creates an additional vector (which can be an issue with large data sets).

set.seed(21)
pred_precipitation <- rnorm(25,2,4)
p <- pmax(pred_precipitation,0)

Subset-replacement is by-far the fastest though:

library(rbenchmark)
gsk3 <- function(x) { x[x<0] <- 0; x }
jmsigner <- function(x) ifelse(x<0, 0, x)
joshua <- function(x) pmin(x,0)
benchmark(joshua(p), gsk3(p), jmsigner(p), replications=10000, order="relative")
         test replications elapsed relative user.self sys.self
2     gsk3(p)        10000   0.215 1.000000     0.216    0.000
1   joshua(p)        10000   0.444 2.065116     0.416    0.016
3 jmsigner(p)        10000   0.656 3.051163     0.652    0.000

autoplot microbenchmark

Upvotes: 20

johannes
johannes

Reputation: 14413

Alternatively you can also use ifelse:

ifelse(pred_precipitation < 0, 0, pred_precipitation)

Upvotes: 12

Related Questions