Reputation: 1003
I am trying to create a function that summarizes and categorizes data. As part of that I want to create some checks to ensure that inputted parameters are valid, eg. the sum of the percentages equal 1, a variable is character or numeric, etc.
In the below example I am trying to do a check to see if the sum of the a,b,c parameters equal 1 and that one of the variables is a class numeric. However when I pass parameters that don't meet the checks, the code will still run regardless if the values sum to 1 or not.
Based on the suggestions, I changed the code to stopifnot()
, however when I pass the second check is.numeric()
it fails.
I put some examples in the bottom of how to apply the function to the diamonds package. The stopifnot function doesn't work like it should. Appreciate any help?
How do I stop if not check that a+b+c ==1 and that dim variable is numeric?
library(tidyverse)
customer_segmentation <- function(df,group,dim,a=.7,b=.26,c=.04)
{
#this is part of the code that I am focused on--------------
stopifnot(a+b+c==1L,!is.numeric(dim))
#you can ignore the part below----------
df %>% #dataframe
group_by({{group}}) %>%
#creates a bunch of columns
summarize(
across({{ dim }}, #make sure this dimension can be aggregratad, later version will handle ratios
list(sum=~sum(.,na.rm=TRUE),
mean=~mean(.,na.rm=TRUE),
n = ~ n(),
median = ~ median(.,na.rm=TRUE),
sd = ~ sd(.,na.rm=TRUE),
mad = ~ mad(.,na.rm=TRUE), #median absolute deviation
aad = ~ mad(., center =mean(.,na.rm=TRUE),na.rm=TRUE), #average absolute deviation
IQR05 = ~quantile(., .05,na.rm=TRUE),
IQR25 = ~quantile(., .25,na.rm=TRUE),
IQR75 = ~quantile(., .75,na.rm=TRUE),
IQR95 = ~quantile(., .95,na.rm=TRUE)
),
.names = "{.fn}") #gives each column their name
) %>%
ungroup() %>%
arrange(desc(sum)) %>% # assuming positive values, descends highest to lowest (should add some logic to switch this)
mutate(cum_sum=cumsum(sum), #cumlative value, if ratio, need some sort of check - need specify
prop_total=sum/max(cum_sum), #assumes positive values, need check
cum_prop_total=cumsum(prop_total), #cumsum percent of total
cum_unit_prop=row_number()/max(row_number()), #unit percent
group_classification_by_dim=
case_when(
cum_prop_total<=a ~"A",
cum_prop_total<=(a+b) ~"B",
TRUE ~ "C"),
dim_threshold=
case_when(group_classification_by_dim=="A"~a,
group_classification_by_dim=="B"~(a+b),
TRUE ~ c)
) %>%
select(-c(prop_total,cum_sum)) %>%
relocate(dim_threshold,group_classification_by_dim,cum_prop_total,cum_unit_prop)
}
#this does not works but should not (a,b,c sums do not equal 1, x is numeric)
diamonds %>%
customer_segmentation(group = clarity,dim=x,a=.7L,b=.2L,c=.1L)
#this does not work but should work (a,b,c sums to 1, x is a numeric)
diamonds %>%
customer_segmentation(group = clarity,dim=x,a=.9,b=.2,c=.1)
#is numeric
diamonds$x %>% class()
#does not work because can't find "x", however abc works with default values
##object 'x' not found
diamonds %>%
customer_segmentation(group = clarity,dim=x)
Upvotes: 0
Views: 800
Reputation: 1812
stopifnot()
stopifnot()
is commonly used to quit on a failed assertion. It is simple to use:
> a = 5
> b = 6
> stopifnot(a == b)
Error: a == b is not TRUE
It is often helpful to include your own help message for the user. This is set as the expression's name:
> stopifnot("A and B should be the same, fool!" = a == b)
Error: A and B should be the same, fool!
We can also perform multiple checks in one statement:
stopifnot("A is too small" = a > 0.5,
"B should be the same as A" = a == b)
An alternative idea that adds some extra flexibility in what to print as error message:
a == b || exit('A should be equal to B, but A=',A,', B=',B)
In this case, if the first part of the OR-statement (a == b
) is satisfied, the second part (exit(...)
) is not executed.
on comparing floating point numbers
Because floating point numbers (numbers with decimals) are not completely accurately stored in binary format, there is always a margin of error around comparisons of them.
The smallest possible fraction that can be stored may vary per R installation:
> .Machine$double.eps
[1] 2.220446e-16
R uses a tolerance with floating point comparisons of .Machine$double.eps^0.5
. Consider the following examples:
> a = 1
> b = 0.00000000000000001
> a - b == 1
[1] TRUE
> a + b == 1
[1] TRUE
> b == 0
[1] FALSE
To have control over the tolerance, use the function all.equal()
:
> format( pi , nsmall=7)
[1] "3.1415927"
> format( 355/113 , nsmall=7)
[1] "3.1415929"
> pi == 355/113
[1] FALSE
> isTRUE(all.equal( pi, 355/113 ))
[1] FALSE
> isTRUE(all.equal( pi, 355/113, tolerance = 0.0000001))
[1] TRUE
Upvotes: 2