Reputation: 1558
I want to create a function in R that will create a numerical column based on a character/categorical column. In order to do this I need to get the distinct values in the categorical column. I can do this outside a function well, but would like to make a reusable function to do it. The issue I've run into is that the same distinct() formula that works outside the function doesn't behave the same way within the formula. I've created a demo below:
# test of call to db to numericize
DF <- data.frame("a" = c("a","b","c","a","b","c"),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
catnum <- function(db, inputcolname) {
x <- distinct(db,inputcolname);
print(x);
return(x);
}
y <- distinct(DF,a)
y
catnum(DF,'a')
While y gives the correct distinct one column answer (one column with (a,b,c) in it), x within the function is the entire dataframe. I have tried with and without the ' ', as in catnum(DF,a) but the results are the same.
Could someone tell me what is happening or suggest some code that would work?
Upvotes: 1
Views: 317
Reputation: 10437
You're inputs are not the same, and so you get different results. If you give distinct
the same arguments you give catnum
, you will get the same result:
isTRUE(all.equal(distinct(DF, a),
catnum(DF, "a")))
## [1] FALSE
isTRUE(all.equal(distinct(DF, "a"),
catnum(DF, "a")))
##[1] TRUE
Unfortunately, this does not work:
catnum(DF, a)
## a b c
## 1 a 0.1 a
## 2 b 1.1 b
## 3 c 2.1 c
## 4 a 3.1 d
## 5 b 4.1 e
## 6 c 5.1 f
The reason, as explained in
vignette("programming")
is that you must jump through several annoying hoops if you want to write functions that use functions from dplyr
. The solution (as you will learn in the vignette) is as follows:
catnum <- function(db, inputcolname) {
inputcolname <- enquo(inputcolname)
distinct(db, !!inputcolname)
}
catnum(DF, a)
## a
## 1 a
## 2 b
## 3 c
Or you could conclude that this is all too confusing and do something like
catnum <- function(db, inputcolname) {
unique(db[, inputcolname, drop = FALSE])
}
catnum(DF, "a")
## a
## 1 a
## 2 b
## 3 c
instead.
Upvotes: 1
Reputation: 20085
One solution is to use distinct_
function inside function. The distinct
expect column name and it doesn't work with column names in a variable
.
For example distinct(DF, "a")
will not work. The actual syntax is: distinct(DF, a)
. Notice the missing quotes
. When distinct
is called from function then column name was provided as variable name (i.e inputcolname) which was evaluated. Hence unexpected result. But distinct_
works on variable name for columns.
library(dplyr)
catnum <- function(db, inputcolname) {
x <- distinct_(db,inputcolname);
#print(x);
return(x);
}
#With modified function results were as expected.
catnum(DF,'a')
# a
# 1 a
# 2 b
# 3 c
Upvotes: 2
Reputation: 1778
Not sure what you are trying to do and where distinct
function is coming from. Are you looking for this?
catnum<-function(DF,var){
length(unique(DF[[var]]))
}
catnum(DF,'a')
Upvotes: 1