Reputation: 307
I edited this question(hopefully as requested)
I need to check every cell of a data.frame, if it's value is in certain range. I am very new to apply and need to work on understanding it.
I have 2 data.frames:
blood_df
: 158 columns,
stat_df
: statistics for every col of blood_df
Attached is a minimal example for explanation:
so far I got this, but it's calculating the same result for every cell.
c0 <- c(0,0,0,0)
c1 <- c(1,2,3,4)
c2 <- c(5,6,7,8)
c3 <- c(9,10,11,12)
c4 <- c(13,14,15,16)
blood_df <- data.frame(c0,c1,c2,c3,c4)
stat_df <- data.frame(matrix(ncol = 5, nrow = 6))
colnames(stat_df) <- colnames(blood_df)
rownames(stat_df) <- c("Mean","3*sd","sum", "Mean2","-3*sd","sum2" )
stat_df[1,2:5] <-apply(blood_df[,2:5], 2, mean, na.rm = TRUE)
stat_df[2,2:5] <-apply(blood_df[1:4,2:5], 2, function(x) 3*sd(x,na.rm=TRUE))
stat_df[3,] <-colSums(stat_df[1:2,])
stat_df[4,2:5] <-apply(blood_df[,2:5], 2, mean, na.rm = TRUE)
stat_df[5,2:5] <-apply(blood_df[1:4,2:5], 2, function(x) -3*sd(x,na.rm=TRUE))
stat_df[6,] <-colSums(stat_df[4:5,])
blood_df:
## c0 c1 c2 c3 c4
## 1 0 1 5 9 13
## 2 0 2 6 10 14
## 3 0 3 7 11 15
## 4 0 4 8 12 16
stat_df:
## c0 c1 c2 c3 c4
## Mean NA 2.500000 6.500000 10.500000 14.500000
## 3*sd NA 3.872983 3.872983 3.872983 3.872983
## sum NA 6.372983 10.372983 14.372983 18.372983
## Mean2 NA 2.500000 6.500000 10.500000 14.500000
## -3*sd NA -3.872983 -3.872983 -3.872983 -3.872983
## sum2 NA -1.372983 2.627017 6.627017 10.627017
The part that is not working as I need it:
blood_df[1:4,2:5] <- apply(blood_df[,2:5],2, function(x)
(ifelse((x > (stat_df[3,2:5]))||
(x < (stat_df[6,2:5])), NA, x)))
So far it gives me:
blood_df:
## c0 c1 c2 c3 c4
## 1 0 1 1 1 1
## 2 0 5 5 5 5
## 3 0 NA NA NA NA
## 4 0 NA NA NA NA
What I'd like to get is:(to check if every value is in between a certain range)
blood_df:
## c0 c1 c2 c3 c4
## 1 0 1 5 9 13
## 2 0 2 6 10 14
## 3 0 3 7 11 15
## 4 0 4 8 12 16
If it's not in the range, the value should change to NA.
Thanks!
Upvotes: 0
Views: 699
Reputation: 4836
Try mapply
:
column_range = 2:5
blood_df[, column_range] = mapply(function(blood, stat){
ifelse((blood > stat[3]) | (blood < stat[6]), NA, blood)
},
blood_df[, column_range],
stat_df[, column_range],
SIMPLIFY = FALSE
)
Upvotes: 1