Reputation: 441
I have a data frame like this
Type Species Letter Number Batch
a X Al 1 H
a Y Si 6 H
b Z Mn 3 R
b Q Qp 9 R
c L Tw 10 R
c S Rl 5 R
Where I have used group_by(type) I want to write a statement that looks at the BATCH column, and if it is R, it will then look at the NUMBER column, and look at the lowest number between the two, and then set both the LETTER and NUMBER for that row to NA. Not even sure if this is possible, but this is what it would look like in the end
Type Species Letter Number Batch
a X Al 1 H
a Y Si 6 H
b Z NA NA R
b Q Qp 9 R
c L Tw 10 R
c S NA NA R
Upvotes: 0
Views: 585
Reputation: 21641
Another idea using replace()
and which.min()
:
df %>%
group_by(Type) %>%
mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number))
Essentially you can read it like this:
Group
df
byType
then, ifBatch == "R"
replace the minimumNumber
value in the group byNA
, else, return the originalNumber
value
Which gives:
#Source: local data frame [6 x 5]
#Groups: Type
#
# Type Species Letter Number Batch
#1 a X Al 1 H
#2 a Y Si 6 H
#3 b Z Mn NA R
#4 b Q Qp 9 R
#5 c L Tw 10 R
#6 c S Rl NA R
Benchmark
df2 <- df[rep(row.names(df), 10e5),]
library(microbenchmark)
mbm <- microbenchmark(
Gregor = df2 %>%
group_by(Type) %>%
mutate(make_na = Batch == "R" & Number == min(Number),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na),
Steven = df2 %>%
group_by(Type) %>%
mutate(Number = ifelse(Batch == "R", replace(Number, which.min(Number), NA), Number)),
times = 10, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Gregor 1.863925 2.230475 2.065081 2.220267 2.004923 1.919964 10 b
# Steven 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10 a
Upvotes: 4
Reputation: 146090
Easy, just mapping your words to code. I create an intermediate column to mark the rows that need to be NA
-ified, then I remove it at then end.
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), 1, 0),
Number = ifelse(make_na == 1, NA, Number),
Letter = ifelse(make_na == 1, NA, Letter)) %>%
select(-make_na)
We can simplify a little bit:
# same code as above, using TRUE/FALSE instead of 1/0
your_grouped_df %>%
mutate(make_na = ifelse(Batch == "R" & Number == min(Number), TRUE, FALSE),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)
Or even a little more, getting rid of the first ifelse()
altogether.
Whenever you have ifelse(..., TRUE, FALSE)
, the ifelse()
is unnecessary, it's returning the same thing as its first argument
# make_na column is created directly as a logical column
your_grouped_df %>%
mutate(make_na = Batch == "R" & Number == min(Number),
Number = ifelse(make_na, NA, Number),
Letter = ifelse(make_na, NA, Letter)) %>%
select(-make_na)
Upvotes: 3
Reputation: 3194
I know you are looking for dplyr
solution, but it might be worth looking also at data.table
solution:
library(data.table)
setDT(df)[Batch=="R",`:=`(Letter=ifelse(Letter==Letter[which.min(Number)],"NA",Letter),Number=ifelse(Number==min(Number),as.integer(NA),Number)),by=Type]
Type Species Letter Number Batch
1: a X Al 1 H
2: a Y Si 6 H
3: b Z NA NA R
4: b Q Qp 9 R
5: c L Tw 10 R
6: c S NA NA R
df<-structure(list(Type = c("a", "a", "b", "b", "c", "c"), Species = c("X",
"Y", "Z", "Q", "L", "S"), Letter = c("Al", "Si", "Mn", "Qp",
"Tw", "Rl"), Number = c(1L, 6L, 3L, 9L, 10L, 5L), Batch = c("H",
"H", "R", "R", "R", "R")), .Names = c("Type", "Species", "Letter",
"Number", "Batch"), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 1