Nick
Nick

Reputation: 145

nested if_else statements in R

I'm working with an animal tracking dataset in which I have locations of birds captured across three different states. I'm trying to filter the dataset into the minimum time birds crossed a certain Latitude with the Latitude varying by state. My data looks something like this:

df <- data.frame(
BirdsID = c("A", "A", "A", 
             "B", "B", "B", 
             "C","C", "C"),

State = c("AR", "AR", "AR", 
           "LA", "LA", "LA", 
           "TN", "TN", "TN"),

Latitude = c(31, 37, 38, 
              29, 31, 32,
              35, 36, 37),

Time = as.Date(c("2020/04/02", "2020/04/03", "2020/04/04", 
          "2020/04/03", "2020/04/04", "2020/04/05",
          "2020/04/05", "2020/04/06", "2020/04/07")))

What I'd like to do is filter the data by Latitude, conditionally upon state. In this case, I'd like to isolate locations of birds from AR above Latitude 36 degrees, LA with Latitude > 30.6, and TN Latitude > 36.5. After filtering, I'd like to distill the data to the minimum time (i.e., the first occasion they were observed above the specified latitude). Here's an attempt which throws an error:

df2 <- df %>%
  if_else(State == "AR", true = filter(Latitude > 36),           #If AR, filter >36deg
         if_else(State == "LA", true = filter(Latitude > 30.6),  #If LA, filter >30.6deg
                false = filter(Latitude > 36.5)                  #else, its TN and filter >36.5                  
                       )
                ) %>%
  group_by(BirdsID) %>%                                          #group by bird
  filter(Time == min(Time))                                      #earliest time above filtered Latitude

The error I'm receiving for this example is Error: "condition" must be a logical vector, not a "data.frame" object. and the error I'm receiving on my actual dataset is Error: "condition" must be a logical vector, not a "tbl_df/tbl/data.frame" object.

Any suggestions or assistance w/ nested ifelse, if_else, or if() statements would be appreciated. Best,

Upvotes: 1

Views: 471

Answers (3)

asenga
asenga

Reputation: 21

here is an example of nested if_else formula:

dat1 <- dat %>%
  mutate (group = ifelse(subject > 2 & subject < 21 | subject == 1 | subject == 22, "patient", "control")) %>%
  mutate (condition = ifelse(stim < 9, "pain", ifelse (stim > 8 & stim < 17, "dep", ifelse(stim > 16 & stim < 25, "pos", "neu")))) 

Upvotes: 0

Sinh Nguyen
Sinh Nguyen

Reputation: 4487

Here is a way to do what you want with group_map from dplyr


library(dplyr)
# Define a function to apply custom filter by state
custom_filter <- function(data) {
  State <- first(data[["State"]])
  if (State == "AR") {
    result <- data %>% 
      filter(Latitude > 36)
  } else if (State == "LA") {
    result <- data %>% 
      filter(Latitude > 30.6)
  } else {
    result <- data %>% 
      filter(Latitude > 36.5)
  }
  result
}

df2 <- df %>%
  group_by(State) %>%
  group_map(.f = ~ custom_filter(.x), .keep = TRUE) %>%
  bind_rows() %>%
  group_by(BirdsID) %>%    
  filter(Time == min(Time))

Output

# A tibble: 3 x 4
# Groups:   BirdsID [3]
  BirdsID State Latitude Time      
  <chr>   <chr> <chr>    <date>    
1 A       AR    37       2020-04-03
2 B       LA    31       2020-04-04
3 C       TN    37       2020-04-07

Upvotes: 0

neilfws
neilfws

Reputation: 33772

You can use dplyr::case_when instead of multiple or nested ifelse.

I would use it to flag the data, then filter on the flag. Something like this:

library(dplyr)

df %>% 
  mutate(flag = case_when(
    State == "AR" & Latitude > 36 ~ "Y",
    State == "LA" & Latitude > 30.6 ~ "Y",
    State == "TN" & Latitude > 36.5 ~ "Y",
    TRUE ~ "N"
  )) %>% 
filter(flag == "Y") %>% 
group_by(State, BirdsID) %>% 
filter(Time == min(Time))

Result:

# A tibble: 3 x 5
# Groups:   State, BirdsID [3]
  BirdsID State Latitude Time       flag 
  <chr>   <chr>    <dbl> <date>     <chr>
1 A       AR          37 2020-04-03 Y    
2 B       LA          31 2020-04-04 Y    
3 C       TN          37 2020-04-07 Y

Data - please use data.frame !

df <- data.frame(BirdsID = c("A", "A", "A", 
                             "B", "B", "B", 
                             "C","C", "C"), 
                 State = c("AR", "AR", "AR", 
                           "LA", "LA", "LA", 
                            "TN", "TN", "TN"),
                 Latitude = Latitude <- c(31, 37, 38, 
                                          29, 31, 32,
                                          35, 36, 37), 
                 Time = as.Date(c("2020/04/02", "2020/04/03", "2020/04/04", 
                                  "2020/04/03", "2020/04/04", "2020/04/05",
                                  "2020/04/05", "2020/04/06", "2020/04/07")))

Upvotes: 1

Related Questions