Riad
Riad

Reputation: 961

Applying a function on a given column

I would like to apply a given function "passFailFunc" on a given column of my dataFrame. Here is an example:

df <- data.frame(A = letters[1:10], B = sample(1:20, 10))
=> 
   A  B
1  a  7
2  b 15
3  c  4
4  d  9
5  e 17
6  f  8
7  g 18
8  h 14
9  i 16
10 j 12

And function definition

passFailFunc <- function(x, th) {
  if (x>th) { 
    status='fail'
  } else {
    status='pass'
  }
  status
}

I would like to create a new column "status" where numbers from column B are considered 'pass' if they are below threshold, say th=15, and fail otherwise

df$status <- lapply(df$B, function(x) passFailFunc(x, 15))
=> 
   A  B status
1  a  7   pass
2  b 15   pass
3  c  4   pass
4  d  9   pass
5  e 17   fail
6  f  8   pass
7  g 18   fail
8  h 14   pass
9  i 16   fail
10 j 12   pass

This works fine, it seems to be doing the job. However when I try:

factor(df$status)

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

the status column is actually a vector

> is.vector(df$status)
[1] TRUE

Question: How to correctly generate the 'status' column ?

Upvotes: 1

Views: 186

Answers (2)

akrun
akrun

Reputation: 887951

You can avoid the error by:

set.seed(1)
df <- data.frame(A = letters[1:10], B = sample(1:20, 10)) 

Using your passFailFunc

df$status <- unlist(lapply(df$B, function(x) passFailFunc(x, 15)))
factor(df$status)
#[1] pass pass pass fail pass pass pass pass fail pass
#Levels: fail pass

or

factor(df$B<=15, labels=c('fail', 'pass'))
#[1] pass pass pass fail pass pass pass pass fail pass
#Levels: fail pass

or

c('pass', 'fail')[(df$B>15) +1]
#[1] "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "fail" "pass"

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92300

lapply is just a pretty for loop and it is always better to try avoiding them in R. Your specific function is easily vectorised using ifelse

df$status <- ifelse(df$B > 15, "fail", "pass")

If you still prefer using it as a function, you could try using data.table package for it

passFailFunc <- function(x, th) {
  ifelse (x > th, "fail", "pass")
}

library(data.table)
setDT(df)[, status := lapply(.SD, function(x) passFailFunc(x, 15)), .SDcols = "B"]

The reason that factor(df$status) doesn't work for you is because lapply returns a list (read ?lapply documentation) you can see it by using str(df). if you still want to do it in your original way, use sapply instead of lapply.

The reason that is.vector(df$status) returns TRUE is because a list is a vector in R.

Try running

is.vector(list(a=1))
## [1] TRUE

Upvotes: 2

Related Questions