eycramer
eycramer

Reputation: 105

R function for assigning value to grouped data based on a single row condition

I would like to create a new column of data based on a value within a grouped variable. 

For instance, given the following dataset, I would like to assign all rows of a group to have the value TRUE if the first time point has a value of "blue". 

Here is an example of my dataset:

group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)

df <- data.frame(group, time, color, first_row_blue)

df

example_data

The last column is what I would like the data to have, however I don't want it hardcoded.

I have tried this: 

df %>%
  group_by(group) %>% 
  mutate(all_blue = ifelse(time == "t1" & color == "blue", TRUE, FALSE))

However, it only assigns a value of TRUE to the first row, even though the data is grouped.  Is there a function I am missing that would allow me to do what I am looking for? 

I have also seen posts for doing something such as a count of the number of times that "blue" is written, however this will not work as it is present in times other than t1. 

Thank you in advance for any advice! 

Upvotes: 2

Views: 316

Answers (6)

jay.sf
jay.sf

Reputation: 73572

Look up if in groups any row has color == 'blue' & time == 't1' using ave.

> transform(df, blue_t1=ave(color == 'blue' & time == 't1', group, FUN=any))
   group time  color first_row_blue blue_t1
1      A   t1   blue           TRUE    TRUE
2      A   t2    red           TRUE    TRUE
3      A   t3  green           TRUE    TRUE
4      A   t4 yellow           TRUE    TRUE
5      B   t1 yellow          FALSE   FALSE
6      B   t2  green          FALSE   FALSE
7      B   t3 purple          FALSE   FALSE
8      B   t4   blue          FALSE   FALSE
9      C   t1   blue           TRUE    TRUE
10     C   t2  green           TRUE    TRUE
11     C   t3 yellow           TRUE    TRUE
12     C   t4    red           TRUE    TRUE

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

If you have blue in each group anyway, in base R, you can try ave + which.max like below

> transform(df, first_row_blue = ave(color == "blue", group, FUN = which.max) == 1)
   group time  color first_row_blue
1      A   t1   blue           TRUE
2      A   t2    red           TRUE
3      A   t3  green           TRUE
4      A   t4 yellow           TRUE
5      B   t1 yellow          FALSE
6      B   t2  green          FALSE
7      B   t3 purple          FALSE
8      B   t4   blue          FALSE
9      C   t1   blue           TRUE
10     C   t2  green           TRUE
11     C   t3 yellow           TRUE
12     C   t4    red           TRUE

Otherwise, for general cases where there might be no blue in a group, you can use

> transform(df, first_row_blue = ave(color == "blue", group, FUN = \(x) 1 %in% which(x)))
   group time  color first_row_blue
1      A   t1   blue           TRUE
2      A   t2    red           TRUE
3      A   t3  green           TRUE
4      A   t4 yellow           TRUE
5      B   t1 yellow          FALSE
6      B   t2  green          FALSE
7      B   t3 purple          FALSE
8      B   t4   blue          FALSE
9      C   t1   blue           TRUE
10     C   t2  green           TRUE
11     C   t3 yellow           TRUE
12     C   t4    red           TRUE

Upvotes: 2

Friede
Friede

Reputation: 7979

Moving my comment to an answer (with some explanation).

You are close! Try

library(dplyr)
df %>% 
   mutate(all_blue = ifelse(first(color) == "blue", TRUE, FALSE), .by = group)

I.e., remove time == "t1" &, and add first(). Even more concise, removing ifelse(), as it is not needed (and considered to be slow):

df %>% 
   mutate(all_blue = first(color) == "blue", .by = group)

Or base:

tapply(df, group, \(x) transform(x, all_blue = color[[1L]] == "blue")) |>
  Reduce(f = rbind)

what I prefer. dplyr::first() is mainly a wrapper for x[[1L]], where x is a vector, i.e, myfirst = \(x) x[[1L]].

Upvotes: 2

s_baldur
s_baldur

Reputation: 33603

An alternative variation on the original attempt.

df %>%
  group_by(group) %>% 
  mutate(all_blue = color[time == "t1"] == "blue")

Upvotes: 3

Carl
Carl

Reputation: 7540

library(tidyverse)

group <- c("A","A", "A", "A", "B","B","B","B","C", "C", "C","C")
time <- c("t1","t2","t3","t4","t1","t2","t3","t4","t1","t2","t3","t4")
color <- c("blue", "red", "green", "yellow", "yellow","green","purple", "blue", "blue", "green", "yellow","red")
first_row_blue <- c(TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)

df <- data.frame(group, time, color, first_row_blue)

df %>%
  group_by(group) %>% 
  mutate(blue2 = ifelse(time == "t1" & color == "blue", TRUE, NA)) |> 
  fill(blue2) %>% 
  mutate(blue2 = replace_na(blue2, FALSE))
#> # A tibble: 12 × 5
#> # Groups:   group [3]
#>    group time  color  first_row_blue blue2
#>    <chr> <chr> <chr>  <lgl>          <lgl>
#>  1 A     t1    blue   TRUE           TRUE 
#>  2 A     t2    red    TRUE           TRUE 
#>  3 A     t3    green  TRUE           TRUE 
#>  4 A     t4    yellow TRUE           TRUE 
#>  5 B     t1    yellow FALSE          FALSE
#>  6 B     t2    green  FALSE          FALSE
#>  7 B     t3    purple FALSE          FALSE
#>  8 B     t4    blue   FALSE          FALSE
#>  9 C     t1    blue   TRUE           TRUE 
#> 10 C     t2    green  TRUE           TRUE 
#> 11 C     t3    yellow TRUE           TRUE 
#> 12 C     t4    red    TRUE           TRUE

Created on 2024-05-06 with reprex v2.1.0

Upvotes: 1

zx8754
zx8754

Reputation: 56219

Try first:

df %>% 
  group_by(group) %>% 
  mutate(x = first(color) == "blue")

Upvotes: 3

Related Questions