Mrmoleje
Mrmoleje

Reputation: 493

Mutate and case when issue - dplyr

I have the following data and I was to make a new column using mutate which details when colour = 'g' then take the level on the g row minus the level figure on the 'r' row.

Then likewise with type. Where type = 1 then take the corresponding level minus the level on the type 2 row.

library(dplyr)

d <- tibble(
  date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
  colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
  type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
  level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76))

Just to be clear this is what I want the data to look like.

So the data should look like this:

d2 <- tibble(
    date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
    colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
    type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
    level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76),
  color_gap = c("NULL", 44, "NULL", "NULL", 9, "NULL", "NULL", "NULL", "NULL", "NULL"),
  type_gap = c(11, "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"))

I started to use mutate and case when and got to the below. However, I'm stuck on the final calculation part. How do I say I want to take the color g level - the color r level?

d %>% 
  mutate(color_gap = case_when(color == "g" ~ level)%>%
 mutate(type_gap = case_when(type== "type1" ~ level)%>%
  ) -> d2

Anyone know how to complete this?

Thanks

Upvotes: 1

Views: 117

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28685

This subtracts the first r level from the first g level, second r level from second g level, etc. Same for type1 and type2. This has no checks at all. It doesn't check whether there is a matching r for each g, whether they are in the expected order, whether they are in the same date-group, etc. It assumes the data is already perfectly formatted as expected, so be careful using this on real data.

d %>% 
  mutate(color_gap = replace(rep(NA, n()), colour == 'g', 
                             level[colour == 'g'] - level[colour == 'r']),
         type_gap = replace(rep(NA, n()), type == 'type1', 
                             level[type == 'type1'] - level[type == 'type2']))
# # A tibble: 10 x 6
#    date  colour type  level color_gap type_gap
#    <chr> <chr>  <chr> <dbl>     <dbl>    <dbl>
#  1 2018  none   type1    78        NA       11
#  2 2018  g      none     99        54       NA
#  3 2018  r      none     45        NA       NA
#  4 2019  none   type2    67        NA       NA
#  5 2019  g      none     87         9       NA
#  6 2019  r      none     78        NA       NA
#  7 2020  none   none     89        NA       NA
#  8 2020  none   none     87        NA       NA
#  9 2020  none   none     67        NA       NA
# 10 2020  none   none     76        NA       NA

Upvotes: 3

Wietze314
Wietze314

Reputation: 6020

you could do this with group_by and mutate.

I assumed that there is only 1 row per date that would satisfy each condition.

d %>% 
  mutate(color_gap = case_when(colour == "g" ~ level)) %>%
  mutate(type_gap = case_when(type== "type1" ~ level)) %>%
  group_by(date) %>%
  mutate(diff = max(color_gap,na.rm=T)-max(type_gap, na.rm=T))

Upvotes: 0

Related Questions