scarlett rouge
scarlett rouge

Reputation: 339

logical functions based on static functions in R dataframe

I have a really large Excel spreadsheet with many 'checks' on observations (300+ columns). The checks consist of Boolean operators (greater than, equals) and some summation/subtraction:

df <-data.frame(checkID = c(1,2,3,4), checkpart1 = c(50, 70, 111, 320),
                 checkpart2 = c("+", "==", "*", ">"), checkpart3 = c(18, 17, 6, 3), checkpart4 =  c("==", NA, "-", NA), checkpart5 = c(80, NA,76,NA), checkpart6 = c(NA, NA, "==", NA), checkpart7 = c(NA,NA,590, NA))
  
head(df) ##this is the input
#checkID checkpart1 checkpart2 checkpart3 checkpart4 checkpart5 checkpart6 checkpart7
#1           50          +         18         ==         80       <NA>         NA
#2           70         ==         17       <NA>         NA       <NA>         NA
#3          111          *          6          -         76         ==        590
#4          320          >          3       <NA>         NA       <NA>         NA

INSERT CODE THAT MAKES THE EXCEL FUNCTIONS COME TO LIFE HERE. 
Mind you that some rows have much longer checks than others, so you can't rely on the column names. 

#outcome data frame should look like this, where the checks have been conducted:
View(outputchecks)
#checkID
#1   FALSE      
#2   FALSE
#3   TRUE        
#4   TRUE   

Does anyone know of some tidyr/dplyr/other application in R that can execute these 'static functions' in the dataframe?

Thank you!

Upvotes: 3

Views: 72

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21938

You can also use the following solution:

  • I used pmap function to capture each row in our data set as a character string, omitting the first variable (checkID)
  • Then I spared all of NA values within each row
  • After that in order for our formulas (now in the form of strings) to be evaluated we first need to collapse them into a character string of length 1
  • Then I used parse_expr from rlang an equivalent to eval from base R to transform strings into expression
  • In the end I used eval_tidy which is again an equivalent to eval function from base R to evaluate our expression

Using rlang is not necessary here as you can use base R functions quite easily but I meant to show you the alternatives.

library(purrr)
library(rlang)

df %>%
  mutate(output = pmap_lgl(select(cur_data(), !checkID), ~ {x <- c(...)[!is.na(c(...))] 
                           parse_expr(paste(x, collapse = " ")) %>% 
                             eval_tidy()}))

  checkID checkpart1 checkpart2 checkpart3 checkpart4 checkpart5 checkpart6 checkpart7 output
1       1         50          +         18         ==         80       <NA>         NA  FALSE
2       2         70         ==         17       <NA>         NA       <NA>         NA  FALSE
3       3        111          *          6          -         76         ==        590   TRUE
4       4        320          >          3       <NA>         NA       <NA>         NA   TRUE

Upvotes: 4

AnilGoyal
AnilGoyal

Reputation: 26238

using pmap

df <-data.frame(checkID = c(1,2,3,4), checkpart1 = c(50, 70, 111, 320),
                checkpart2 = c("+", "==", "*", ">"), checkpart3 = c(18, 17, 6, 3), checkpart4 =  c("==", NA, "-", NA), checkpart5 = c(80, NA,76,NA), checkpart6 = c(NA, NA, "==", NA), checkpart7 = c(NA,NA,590, NA))


library(tidyverse)
df %>% mutate(exp = pmap_lgl(df[-1], ~ eval(parse(text = paste(na.omit(c(...)), collapse = '')))))
#>   checkID checkpart1 checkpart2 checkpart3 checkpart4 checkpart5 checkpart6
#> 1       1         50          +         18         ==         80       <NA>
#> 2       2         70         ==         17       <NA>         NA       <NA>
#> 3       3        111          *          6          -         76         ==
#> 4       4        320          >          3       <NA>         NA       <NA>
#>   checkpart7   exp
#> 1         NA FALSE
#> 2         NA FALSE
#> 3        590  TRUE
#> 4         NA  TRUE

Created on 2021-07-04 by the reprex package (v2.0.0)


df <-data.frame(checkID = c(1,2,3,4), checkpart1 = c(50, 70, 111, 320),
                checkpart2 = c("+", "==", "*", ">"), checkpart3 = c(18, 17, 6, 3), checkpart4 =  c("==", NA, "-", NA), checkpart5 = c(80, NA,76,NA), checkpart6 = c(NA, NA, "==", NA), checkpart7 = c(NA,NA,590, NA))


library(tidyverse)
df %>% group_by(checkID) %>%
  mutate(across(everything(), ~ifelse(is.na(.), '', as.character(.)))) %>%
  rowwise() %>%
  mutate(exp = eval(parse(text = paste(c_across(everything()), collapse = ''))))

# A tibble: 4 x 9
# Rowwise:  checkID
  checkID checkpart1 checkpart2 checkpart3 checkpart4 checkpart5 checkpart6 checkpart7 exp  
    <dbl> <chr>      <chr>      <chr>      <chr>      <chr>      <chr>      <chr>      <lgl>
1       1 50         +          18         "=="       "80"       ""         ""         FALSE
2       2 70         ==         17         ""         ""         ""         ""         FALSE
3       3 111        *          6          "-"        "76"       "=="       "590"      TRUE 
4       4 320        >          3          ""         ""         ""         ""         TRUE 

Or transmute will result

df %>% group_by(checkID) %>%
  mutate(across(everything(), ~ifelse(is.na(.), '', as.character(.)))) %>%
  rowwise() %>%
  transmute(exp = eval(parse(text = paste(c_across(everything()), collapse = ''))))

# A tibble: 4 x 2
# Rowwise:  checkID
  checkID exp  
    <dbl> <lgl>
1       1 FALSE
2       2 FALSE
3       3 TRUE 
4       4 TRUE 

using summarise will also drop the groups

df %>% group_by(checkID) %>%
  mutate(across(everything(), ~ifelse(is.na(.), '', as.character(.)))) %>%
  rowwise() %>%
  summarise(exp = eval(parse(text = paste(c_across(everything()), collapse = ''))), .groups = 'drop')

# A tibble: 4 x 2
  checkID exp  
    <dbl> <lgl>
1       1 FALSE
2       2 FALSE
3       3 TRUE 
4       4 TRUE 

Upvotes: 4

Martin Gal
Martin Gal

Reputation: 16988

Here is a tidyr and dplyr possibility:

library(tidyr)
library(dplyr)

df %>% 
  tibble() %>%
  unite(check, starts_with("checkpart"), sep=" ", na.rm = TRUE) %>% 
  rowwise() %>% 
  mutate(check = eval(str2expression(check))) %>%
  ungroup()

returns

# A tibble: 4 x 2
  checkID check
    <dbl> <lgl>
1       1 FALSE
2       2 FALSE
3       3 TRUE 
4       4 TRUE 

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76641

Here is a way with eval/parse. Start by forming a string with the operations and then evaluate the expression.

txt <- apply(df[-1], 1, function(x) paste(trimws(x[!is.na(x)]), collapse = ""))
sapply(txt, function(x) eval(parse(text = x)))
#    50+18==80        70==17 111*6-76==590         320>3 
#        FALSE         FALSE          TRUE          TRUE 

Upvotes: 4

Related Questions