alejandro_hagan
alejandro_hagan

Reputation: 1003

how to use stop function in R to check function parameters

I am trying to create a function that summarizes and categorizes data. As part of that I want to create some checks to ensure that inputted parameters are valid, eg. the sum of the percentages equal 1, a variable is character or numeric, etc.

In the below example I am trying to do a check to see if the sum of the a,b,c parameters equal 1 and that one of the variables is a class numeric. However when I pass parameters that don't meet the checks, the code will still run regardless if the values sum to 1 or not.

Based on the suggestions, I changed the code to stopifnot(), however when I pass the second check is.numeric() it fails.

I put some examples in the bottom of how to apply the function to the diamonds package. The stopifnot function doesn't work like it should. Appreciate any help?

How do I stop if not check that a+b+c ==1 and that dim variable is numeric?

library(tidyverse)


customer_segmentation <- function(df,group,dim,a=.7,b=.26,c=.04)
  {
  
 #this is part of the code that I am focused on--------------
  stopifnot(a+b+c==1L,!is.numeric(dim))
  
 #you can ignore the part below---------- 
  
  
  
  df %>% #dataframe
    group_by({{group}}) %>% 

    #creates a bunch of columns
    summarize(  
      across({{ dim }}, #make sure this dimension can be aggregratad, later version will handle ratios
             list(sum=~sum(.,na.rm=TRUE),
                  mean=~mean(.,na.rm=TRUE),
                  n =  ~ n(),
                  median =  ~ median(.,na.rm=TRUE),
                  sd =  ~ sd(.,na.rm=TRUE),
                  mad =  ~ mad(.,na.rm=TRUE), #median absolute deviation
                  aad =  ~ mad(., center =mean(.,na.rm=TRUE),na.rm=TRUE), #average absolute deviation
                  IQR05 = ~quantile(., .05,na.rm=TRUE),
                  IQR25 = ~quantile(., .25,na.rm=TRUE),
                  IQR75 = ~quantile(., .75,na.rm=TRUE),
                  IQR95 = ~quantile(., .95,na.rm=TRUE)
                  ),
             .names = "{.fn}") #gives each column their name
    ) %>%
    ungroup() %>% 

    arrange(desc(sum)) %>% # assuming positive values, descends highest to lowest (should add some logic to switch this)
    
    mutate(cum_sum=cumsum(sum), #cumlative value, if ratio, need some sort of check - need specify
         prop_total=sum/max(cum_sum), #assumes positive values, need check
         cum_prop_total=cumsum(prop_total), #cumsum percent of total
         cum_unit_prop=row_number()/max(row_number()), #unit percent
         group_classification_by_dim=
           case_when(
           cum_prop_total<=a ~"A",
           cum_prop_total<=(a+b) ~"B",
           TRUE ~ "C"),
         dim_threshold=
           case_when(group_classification_by_dim=="A"~a,
                     group_classification_by_dim=="B"~(a+b),
                     TRUE ~ c)
         ) %>% 
    select(-c(prop_total,cum_sum)) %>% 
    relocate(dim_threshold,group_classification_by_dim,cum_prop_total,cum_unit_prop)

}


#this does not works but should not (a,b,c sums do not equal 1, x is numeric)

diamonds %>% 
  customer_segmentation(group = clarity,dim=x,a=.7L,b=.2L,c=.1L)

#this does not work but should work (a,b,c sums to 1, x is a numeric)

diamonds %>% 
  customer_segmentation(group = clarity,dim=x,a=.9,b=.2,c=.1)


#is numeric
diamonds$x %>% class()

#does not work because can't find "x", however abc works with default values
##object 'x' not found
diamonds %>% 
  customer_segmentation(group = clarity,dim=x)

Upvotes: 0

Views: 800

Answers (1)

Caspar V.
Caspar V.

Reputation: 1812

stopifnot()

stopifnot() is commonly used to quit on a failed assertion. It is simple to use:

> a = 5
> b = 6
> stopifnot(a == b)

Error: a == b is not TRUE

It is often helpful to include your own help message for the user. This is set as the expression's name:

> stopifnot("A and B should be the same, fool!" = a == b)

Error: A and B should be the same, fool!

We can also perform multiple checks in one statement:

stopifnot("A is too small" = a > 0.5,
          "B should be the same as A" = a == b)

An alternative idea that adds some extra flexibility in what to print as error message:

a == b || exit('A should be equal to B, but A=',A,', B=',B)

In this case, if the first part of the OR-statement (a == b) is satisfied, the second part (exit(...)) is not executed.

on comparing floating point numbers

Because floating point numbers (numbers with decimals) are not completely accurately stored in binary format, there is always a margin of error around comparisons of them.

The smallest possible fraction that can be stored may vary per R installation:

> .Machine$double.eps
[1] 2.220446e-16

R uses a tolerance with floating point comparisons of .Machine$double.eps^0.5. Consider the following examples:

> a = 1
> b = 0.00000000000000001

> a - b == 1
[1] TRUE

> a + b == 1
[1] TRUE

> b == 0
[1] FALSE

To have control over the tolerance, use the function all.equal():

> format( pi , nsmall=7)
[1] "3.1415927"

> format( 355/113 , nsmall=7)
[1] "3.1415929"

> pi == 355/113
[1] FALSE

> isTRUE(all.equal( pi, 355/113 ))
[1] FALSE

> isTRUE(all.equal( pi, 355/113, tolerance = 0.0000001))
[1] TRUE

Upvotes: 2

Related Questions