Reputation: 195
I am struggling with writing a function that would iterate over every value in a data frame and return a data frame only with values that don't meet a threshold but with the same column names.
Here is a dataframe:
salary <- c(21000, 23400, 26800)
bonus <- c(350, 400, 170)
startdate <- as.Date(c('2010-11-1','2010-11-2','2010-11-3'))
df <- data.frame(startdate, salary, bonus)
Here is my function:
def2 <- function(x, column){
d = NULL
for (row in 1:nrow(x)) {
val <- x[row,column]
dat <- x[row, "startdate"]
m <- mean(x[,column])
y <- (as.Date(dat)-2)
if (val < m) {
if (val < y) {
print('Number is too low')
} else {
susp_date = paste(dat)
value = paste(val)
d = rbind(d, data.frame(susp_date, value))
}
} else {
next
}
}
return (d)
}
So basically, I get a more or less desired output: I can see values that are less than a mean within a column. Here is an output I get:
susp_date value
1 2010-11-01 21000
2 2010-11-02 23400
But I want to save the names and the order of columns as in input data frame and have this view for all the columns and not just for one.
My dream is that I get a data frame as an output with the same columns as an original one but values are replaced with 1 if the value is lower than a mean within a column AND less than a value that corresponds to a value (start date - 2 days) and 0 if non of these conditions are held:
startdate salary bonus
1 2010-11-01 1 0
2 2010-11-02 1 0
3 2010-11-03 0 1
I have tried different methods including copying a data frame and then filling it dynamically; using lapply (in my case several conditions should be held) and mix of them but no success.. Any help would be very appreciated!
Upvotes: 0
Views: 225
Reputation: 1778
Here is an answer that doesn't use any libraries. All you have to do is use sapply
and ifelse
in your function. Sapply
iterates over each elememt in the column. Edited to include both conditions. :
def2<-function(x){
m<-mean(x, na.rm=T)
sapply(x, function(y){
ifelse(y>m,1,0)
})
}
# Both conditions (assumes date is ordered (ascending) and doesn't have any duplicates!)
def2<-function(w,x){
m<-mean(x, na.rm=T)
sapply(seq_along(x), function(y){
n<-w[y]-2
o<-df$salary[df$startdate==n]
ifelse((x[y]>m & x[y]>o) ,1,0)
})
}
# Applying the function
df$bonus<-def2(x=df$salary,w=df$startdate)
Upvotes: 1
Reputation: 1058
Looks like this is what you want. This post will be adjusted if it is not.
library(dplyr)
df%>%
mutate_if(is.numeric, funs(as.numeric(. < mean(.))))
startdate salary bonus
1 2010-11-01 1 0
2 2010-11-02 1 0
3 2010-11-03 0 1
Upvotes: 1