Reputation: 961
I would like to apply a given function "passFailFunc" on a given column of my dataFrame. Here is an example:
df <- data.frame(A = letters[1:10], B = sample(1:20, 10))
=>
A B
1 a 7
2 b 15
3 c 4
4 d 9
5 e 17
6 f 8
7 g 18
8 h 14
9 i 16
10 j 12
And function definition
passFailFunc <- function(x, th) {
if (x>th) {
status='fail'
} else {
status='pass'
}
status
}
I would like to create a new column "status" where numbers from column B are considered 'pass' if they are below threshold, say th=15, and fail otherwise
df$status <- lapply(df$B, function(x) passFailFunc(x, 15))
=>
A B status
1 a 7 pass
2 b 15 pass
3 c 4 pass
4 d 9 pass
5 e 17 fail
6 f 8 pass
7 g 18 fail
8 h 14 pass
9 i 16 fail
10 j 12 pass
This works fine, it seems to be doing the job. However when I try:
factor(df$status)
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
the status column is actually a vector
> is.vector(df$status)
[1] TRUE
Question: How to correctly generate the 'status' column ?
Upvotes: 1
Views: 186
Reputation: 887951
You can avoid the error by:
set.seed(1)
df <- data.frame(A = letters[1:10], B = sample(1:20, 10))
Using your passFailFunc
df$status <- unlist(lapply(df$B, function(x) passFailFunc(x, 15)))
factor(df$status)
#[1] pass pass pass fail pass pass pass pass fail pass
#Levels: fail pass
or
factor(df$B<=15, labels=c('fail', 'pass'))
#[1] pass pass pass fail pass pass pass pass fail pass
#Levels: fail pass
or
c('pass', 'fail')[(df$B>15) +1]
#[1] "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "fail" "pass"
Upvotes: 1
Reputation: 92300
lapply
is just a pretty for
loop and it is always better to try avoiding them in R. Your specific function is easily vectorised using ifelse
df$status <- ifelse(df$B > 15, "fail", "pass")
If you still prefer using it as a function, you could try using data.table
package for it
passFailFunc <- function(x, th) {
ifelse (x > th, "fail", "pass")
}
library(data.table)
setDT(df)[, status := lapply(.SD, function(x) passFailFunc(x, 15)), .SDcols = "B"]
The reason that factor(df$status)
doesn't work for you is because lapply
returns a list (read ?lapply
documentation) you can see it by using str(df)
. if you still want to do it in your original way, use sapply
instead of lapply
.
The reason that is.vector(df$status)
returns TRUE
is because a list
is a vector in R.
Try running
is.vector(list(a=1))
## [1] TRUE
Upvotes: 2