user4999605
user4999605

Reputation: 441

N/A out certain rows in R using ifelse dplyr

I have a data frame like this

Type Species Letter Number Batch
  a     X      Al      1     H
  a     Y      Si      6     H

  b     Z      Mn      3     R
  b     Q      Qp      9     R

  c     L      Tw      10    R
  c     S      Rl      5     R

Where I have used group_by(type) I want to write a statement that looks at the BATCH column, and if it is R, it will then look at the NUMBER column, and look at the lowest number between the two, and then set both the LETTER and NUMBER for that row to NA. Not even sure if this is possible, but this is what it would look like in the end

Type Species Letter Number Batch
  a     X      Al      1     H
  a     Y      Si      6     H

  b     Z      NA      NA    R
  b     Q      Qp      9     R

  c     L      Tw      10    R
  c     S      NA      NA    R

Upvotes: 0

Views: 585

Answers (3)

Steven Beaupré
Steven Beaupré

Reputation: 21641

Another idea using replace() and which.min():

df %>%
  group_by(Type) %>%
  mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number))

Essentially you can read it like this:

Group df by Type then, if Batch == "R" replace the minimum Number value in the group by NA, else, return the original Number value

Which gives:

#Source: local data frame [6 x 5]
#Groups: Type
#
#  Type Species Letter Number Batch
#1    a       X     Al      1     H
#2    a       Y     Si      6     H
#3    b       Z     Mn     NA     R
#4    b       Q     Qp      9     R
#5    c       L     Tw     10     R
#6    c       S     Rl     NA     R

Benchmark

df2 <- df[rep(row.names(df), 10e5),]

library(microbenchmark)
mbm <- microbenchmark(
  Gregor = df2 %>%
    group_by(Type) %>%
    mutate(make_na = Batch == "R" & Number == min(Number),
           Number = ifelse(make_na, NA, Number),
           Letter = ifelse(make_na, NA, Letter)) %>%
    select(-make_na),
  Steven = df2 %>%
    group_by(Type) %>%
    mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number)),
  times = 10, unit = "relative")

enter image description here

# Unit: relative
#    expr      min       lq     mean   median       uq      max neval cld
#  Gregor 1.863925 2.230475 2.065081 2.220267 2.004923 1.919964    10   b
#  Steven 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10  a 

Upvotes: 4

Gregor Thomas
Gregor Thomas

Reputation: 146090

Easy, just mapping your words to code. I create an intermediate column to mark the rows that need to be NA-ified, then I remove it at then end.

your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), 1, 0),
       Number = ifelse(make_na == 1, NA, Number),
       Letter = ifelse(make_na == 1, NA, Letter)) %>%
select(-make_na)

We can simplify a little bit:

# same code as above, using TRUE/FALSE instead of 1/0
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), TRUE, FALSE),
       Number = ifelse(make_na, NA, Number),
       Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)

Or even a little more, getting rid of the first ifelse() altogether. Whenever you have ifelse(..., TRUE, FALSE), the ifelse() is unnecessary, it's returning the same thing as its first argument

# make_na column is created directly as a logical column
your_grouped_df %>%
mutate(make_na = Batch == "R" & Number == min(Number),
       Number = ifelse(make_na, NA, Number),
       Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)

Upvotes: 3

user227710
user227710

Reputation: 3194

I know you are looking for dplyr solution, but it might be worth looking also at data.table solution:

library(data.table)

    setDT(df)[Batch=="R",`:=`(Letter=ifelse(Letter==Letter[which.min(Number)],"NA",Letter),Number=ifelse(Number==min(Number),as.integer(NA),Number)),by=Type]
   Type Species Letter Number Batch
1:    a       X     Al      1     H
2:    a       Y     Si      6     H
3:    b       Z     NA     NA     R
4:    b       Q     Qp      9     R
5:    c       L     Tw     10     R
6:    c       S     NA     NA     R

Data

   df<-structure(list(Type = c("a", "a", "b", "b", "c", "c"), Species = c("X", 
"Y", "Z", "Q", "L", "S"), Letter = c("Al", "Si", "Mn", "Qp", 
"Tw", "Rl"), Number = c(1L, 6L, 3L, 9L, 10L, 5L), Batch = c("H", 
"H", "R", "R", "R", "R")), .Names = c("Type", "Species", "Letter", 
"Number", "Batch"), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

Related Questions