Reputation: 105
I would like to create a new column of data based on a value within a grouped variable.
For instance, given the following dataset, I would like to assign all rows of a group to have the value TRUE if the first time point has a value of "blue".
Here is an example of my dataset:
group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)
df <- data.frame(group, time, color, first_row_blue)
df
The last column is what I would like the data to have, however I don't want it hardcoded.
I have tried this:
df %>%
group_by(group) %>%
mutate(all_blue = ifelse(time == "t1" & color == "blue", TRUE, FALSE))
However, it only assigns a value of TRUE to the first row, even though the data is grouped. Is there a function I am missing that would allow me to do what I am looking for?
I have also seen posts for doing something such as a count of the number of times that "blue" is written, however this will not work as it is present in times other than t1.
Thank you in advance for any advice!
Upvotes: 2
Views: 316
Reputation: 73572
Look up if in groups any
row has color == 'blue' & time == 't1'
using ave
.
> transform(df, blue_t1=ave(color == 'blue' & time == 't1', group, FUN=any))
group time color first_row_blue blue_t1
1 A t1 blue TRUE TRUE
2 A t2 red TRUE TRUE
3 A t3 green TRUE TRUE
4 A t4 yellow TRUE TRUE
5 B t1 yellow FALSE FALSE
6 B t2 green FALSE FALSE
7 B t3 purple FALSE FALSE
8 B t4 blue FALSE FALSE
9 C t1 blue TRUE TRUE
10 C t2 green TRUE TRUE
11 C t3 yellow TRUE TRUE
12 C t4 red TRUE TRUE
Upvotes: 1
Reputation: 102529
If you have blue
in each group anyway, in base R, you can try ave
+ which.max
like below
> transform(df, first_row_blue = ave(color == "blue", group, FUN = which.max) == 1)
group time color first_row_blue
1 A t1 blue TRUE
2 A t2 red TRUE
3 A t3 green TRUE
4 A t4 yellow TRUE
5 B t1 yellow FALSE
6 B t2 green FALSE
7 B t3 purple FALSE
8 B t4 blue FALSE
9 C t1 blue TRUE
10 C t2 green TRUE
11 C t3 yellow TRUE
12 C t4 red TRUE
Otherwise, for general cases where there might be no blue
in a group, you can use
> transform(df, first_row_blue = ave(color == "blue", group, FUN = \(x) 1 %in% which(x)))
group time color first_row_blue
1 A t1 blue TRUE
2 A t2 red TRUE
3 A t3 green TRUE
4 A t4 yellow TRUE
5 B t1 yellow FALSE
6 B t2 green FALSE
7 B t3 purple FALSE
8 B t4 blue FALSE
9 C t1 blue TRUE
10 C t2 green TRUE
11 C t3 yellow TRUE
12 C t4 red TRUE
Upvotes: 2
Reputation: 7979
Moving my comment to an answer (with some explanation).
You are close! Try
library(dplyr)
df %>%
mutate(all_blue = ifelse(first(color) == "blue", TRUE, FALSE), .by = group)
I.e., remove time == "t1" &
, and add first()
. Even more concise, removing ifelse()
, as it is not needed (and considered to be slow):
df %>%
mutate(all_blue = first(color) == "blue", .by = group)
Or base:
tapply(df, group, \(x) transform(x, all_blue = color[[1L]] == "blue")) |>
Reduce(f = rbind)
what I prefer. dplyr::first()
is mainly a wrapper for x[[1L]]
, where x
is a vector, i.e, myfirst = \(x) x[[1L]]
.
Upvotes: 2
Reputation: 33603
An alternative variation on the original attempt.
df %>%
group_by(group) %>%
mutate(all_blue = color[time == "t1"] == "blue")
Upvotes: 3
Reputation: 7540
library(tidyverse)
group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)
df <- data.frame(group, time, color, first_row_blue)
df %>%
group_by(group) %>%
mutate(blue2 = ifelse(time == "t1" & color == "blue", TRUE, NA)) |>
fill(blue2) %>%
mutate(blue2 = replace_na(blue2, FALSE))
#> # A tibble: 12 × 5
#> # Groups: group [3]
#> group time color first_row_blue blue2
#> <chr> <chr> <chr> <lgl> <lgl>
#> 1 A t1 blue TRUE TRUE
#> 2 A t2 red TRUE TRUE
#> 3 A t3 green TRUE TRUE
#> 4 A t4 yellow TRUE TRUE
#> 5 B t1 yellow FALSE FALSE
#> 6 B t2 green FALSE FALSE
#> 7 B t3 purple FALSE FALSE
#> 8 B t4 blue FALSE FALSE
#> 9 C t1 blue TRUE TRUE
#> 10 C t2 green TRUE TRUE
#> 11 C t3 yellow TRUE TRUE
#> 12 C t4 red TRUE TRUE
Created on 2024-05-06 with reprex v2.1.0
Upvotes: 1
Reputation: 56219
Try first
:
df %>%
group_by(group) %>%
mutate(x = first(color) == "blue")
Upvotes: 3