Ekaterina
Ekaterina

Reputation: 195

Iterate over every value in data frame and compare it with a mean within a column, return a data frame

I am struggling with writing a function that would iterate over every value in a data frame and return a data frame only with values that don't meet a threshold but with the same column names.

Here is a dataframe:

salary <- c(21000, 23400, 26800)
bonus <- c(350, 400, 170)
startdate <- as.Date(c('2010-11-1','2010-11-2','2010-11-3'))
df <- data.frame(startdate, salary, bonus)

Here is my function:

def2 <- function(x, column){
  d = NULL
  for (row in 1:nrow(x)) {
  val <- x[row,column]
  dat <- x[row, "startdate"]
  m <- mean(x[,column])
  y <- (as.Date(dat)-2)
    if (val < m) {
      if (val < y) {
        print('Number is too low')
      } else {
        susp_date = paste(dat)
        value = paste(val)
        d = rbind(d, data.frame(susp_date, value))
      }
    } else {
      next
    }
  }
  return (d)
}

So basically, I get a more or less desired output: I can see values that are less than a mean within a column. Here is an output I get:

susp_date value
1 2010-11-01 21000
2 2010-11-02 23400

But I want to save the names and the order of columns as in input data frame and have this view for all the columns and not just for one.

My dream is that I get a data frame as an output with the same columns as an original one but values are replaced with 1 if the value is lower than a mean within a column AND less than a value that corresponds to a value (start date - 2 days) and 0 if non of these conditions are held:

   startdate salary bonus
1 2010-11-01  1       0
2 2010-11-02  1       0
3 2010-11-03  0       1

I have tried different methods including copying a data frame and then filling it dynamically; using lapply (in my case several conditions should be held) and mix of them but no success.. Any help would be very appreciated!

Upvotes: 0

Views: 225

Answers (2)

JeanVuda
JeanVuda

Reputation: 1778

Here is an answer that doesn't use any libraries. All you have to do is use sapply and ifelse in your function. Sapply iterates over each elememt in the column. Edited to include both conditions. :

def2<-function(x){
  m<-mean(x, na.rm=T)
  sapply(x, function(y){
    ifelse(y>m,1,0)
  })
}

# Both conditions (assumes date is ordered (ascending) and doesn't have any duplicates!)
def2<-function(w,x){
  m<-mean(x, na.rm=T)
  sapply(seq_along(x), function(y){
    n<-w[y]-2
    o<-df$salary[df$startdate==n]
    ifelse((x[y]>m & x[y]>o) ,1,0)
  })
}

# Applying the function
df$bonus<-def2(x=df$salary,w=df$startdate)

Upvotes: 1

InfiniteFlash
InfiniteFlash

Reputation: 1058

Looks like this is what you want. This post will be adjusted if it is not.

library(dplyr)

df%>%
mutate_if(is.numeric, funs(as.numeric(. < mean(.))))

  startdate salary bonus
1 2010-11-01      1     0
2 2010-11-02      1     0
3 2010-11-03      0     1

Upvotes: 1

Related Questions