HoelR
HoelR

Reputation: 6563

Case_when statements in pipe operator on a vector

I am trying to understand how to better use if else or case_when in a pipeline when manipulating a vector. After scraping an element of a website I am left with this vector:

[1] "66"        "121"       "112 - 150" "211"       "197"       "25"        "72"       
[8] "59"        "100"       "69 - 194" 

c("66", "121", "112 - 150", "211", "197", "25", "72", "59", "100", 
"69 - 194")

library(tidyverse)
library(stringr) (1.5.0)

I want to manipulate them in a vector before I put them in a dataframe/tibble. Such that if there are two numbers in a string (ex. 112 - 150), replace it with the mean of the two. I have tried the following:

vector %>%
  case_when(
    str_detect(., "-") ~ . %>%
      str_split_1(" - ") %>%
      as.numeric() %>%
      mean(),
    T ~ .
  ) 

Which does not work. Individually, it works:

"112 - 150" %>% 
  str_split_1(" - ") %>% 
  as.numeric() %>% 
  mean()

[1] 131

Then I thought perhaps case_when() does not work with a vector. But it clearly does:

case_when(
  vector == "66" ~ "SIXSIX", 
  TRUE ~ "NOT 66"
)

 [1] "SIXSIX" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66"
 [9] "NOT 66" "NOT 66"

I would prefer a suggestion without the conventional if statement as such:

vector %>% 
  {if (cond) ** else **}

Upvotes: 5

Views: 493

Answers (2)

akrun
akrun

Reputation: 887068

An option is to use read.table to read the data in to a two column data.frame and use rowMeans in base R

rowMeans(read.table(text = vector, header = FALSE, sep = '-', 
  fill = TRUE), na.rm = TRUE)

-output

 [1]  66.0 121.0 131.0 211.0 197.0  25.0  72.0  59.0 100.0 131.5

Or in a pipe

vector|> 
  read.table(text = _, header = FALSE, sep = '-', fill = TRUE) |> 
   rowMeans(na.rm = TRUE)
 [1]  66.0 121.0 131.0 211.0 197.0  25.0  72.0  59.0 100.0 131.5

Upvotes: 1

Maël
Maël

Reputation: 51974

When written with a pipe, vector %>% case_when(...) evaluates as case_when(vector, ...), but since even the first argument of case_when must be a two-sided formula, it returns an error. Hence the message:

Error in case_when(): ! Case 1 (.) must be a two-sided formula, not a character vector.

In this case, you don't need case_when, since you can apply mean even to single elements:

library(purrr)
library(stringr)
library(dplyr)

vector %>% 
  str_split(' - ') %>% 
  map_dbl(~ mean(as.numeric(.x)))
#[1]  66.0 121.0 131.0 211.0 197.0  25.0  72.0  59.0 100.0 131.5

With case_when, this still works:

case_when( 
  str_detect(vector, "-") ~ vector %>% 
    str_split(' - ') %>% 
    map_dbl(~ mean(as.numeric(.x))),
  T ~ as.numeric(vector)
)

Upvotes: 4

Related Questions