HollowBastion
HollowBastion

Reputation: 223

How to do sapply to several columns in data frame r and have the result in a new column

I have a dataframe:

val1 val2 val3 val4 val5
5 2 6 7 2
9 1 5 7 6
2 3 5 7 1

And a function which needs to use val2, val3, val4 values from each row

aFunction <- function(v2,v3,v4) {
    result = v2*2/v3 + max(max(v2,v3),v4)
    return(result)
}

I need the result of this function to be stored in a new column in the data frame:

val1 val2 val3 val4 val5 result
5 2 4 7 2 8
9 3 2 7 6 10
2 10 5 7 1 14

But I'm not sure how to do this,

I've thought of doing

result = apply(df,function(x) {aFunction(x$val2,x$val3,x$val4)})

but it doesn't seem to work

Upvotes: 1

Views: 88

Answers (3)

GPierre
GPierre

Reputation: 903

You need to access the columns differently when calling your function. This solution works for your example:

df<-read.table(text="val1 val2 val3 val4 val5
           5 2 6 7 2
           9 3 2 7 6
           2 10 5 7 1",header=T)

aFunction <- function(v2,v3,v4) {
 v2*2/v3 + max(max(v2,v3),v4)
}

df$results<-apply(df,1,function(x) {aFunction(x[2],x[3],x[4])})

Note that, even if this answer specifically address your problem, more elegant solutions were provided by the other answers.

Upvotes: 1

Andrew Jackson
Andrew Jackson

Reputation: 823

You can use the dplyr package which uses natural verbs to go through the process. Using the second set of numbers in your example, here's what you can do:

zz <- "val1 val2 val3 val4 val5
1 5 2 4 7 2
2 9 3 2 7 6
3 2 10 5 7 1"
Data <- read.table(text=zz, header = TRUE) # Creates the dataframe

library(dplyr)
Data %>%
  rowwise() %>%
  mutate(result = (val2 * 2 / val3) + max(val2, val3, val4))

The command takes your data and indicates that it will evaluate everything rowwise() which is important so you don't get the maximum values for each row in your dataframe. Finally, mutate() makes a new variable that is based on the function you supplied.

To save the data to a new element use newdata <- in the beginning.

Upvotes: 1

polka
polka

Reputation: 1529

You should build a general function.

newFunction <- function(a, b,c) { result= a*2/b +c; return(result)}

Get that max for the three columns.

newConstant <- max(max(df$val2, df$val3),df$val4)

Use sapply to apply to the column and assign to a new column.

df$val5 <- sapply(df, newFunction, df$val2, df$val3, newConstant)

I am not able to run this solution right now, but that setup should work in theory.

Upvotes: 1

Related Questions