Reputation: 45
I have a data frame (df) with 4 columns of values (V1 to V4 columns) that I need to select based on two other columns (max and min columns). My aim is to assign NAs to those values outside of the range set by the max and min columns for each row and calculate the mean of the remaining values.
V1 V2 V3 V4 max min
1 3 6 8 7 5
23 30 5 17 30 16
The expected output would be:
V1 V2 V3 V4 max min mean
NA NA 6 NA 7 5 6
23 30 NA 17 30 16 35
So far, I can only do this by using the following script to assign NAs...
df$V1 <- ifelse(df$V1 > df$max | df$V1 < df$min, NA, df$V1)
df$V2 <- ifelse(df$V2 > df$max | df$V2 < df$min, NA, df$V2)
df$V3 <- ifelse(df$V3 > df$max | df$V3 < df$min, NA, df$V3)
df$V4 <- ifelse(df$V4 > df$max | df$V4 < df$min, NA, df$V4)
...and then the following to calculate the mean:
df$mean <- rowMeans(df[, 1:4], na.rm = TRUE)
The problem is that the number of columns in the real data will be much larger than 4 and this method seems to require far too much repetition. Is there a better way of doing this in R?
I have tried using data.table
to subset the valid values to then use the apply
function without success:
df <- df[df[,1:4] <= df$max | df[,1:4] >= df$min, ]
apply(df[,1:4], 1, function(x) mean(x))
Thank you.
Upvotes: 2
Views: 1998
Reputation: 38520
Here is a simple solution with a for
loop to fill in the NAs and rowMeans
to calculate the mean of each row.
# loop through rows and fill in NA for values outside of min/max
for(i in 1:nrow(df))
is.na(df[i, 1:4]) <- df[i, 1:4] < df[i, "min"] | df[i, 1:4] > df[i, "max"]
# calculate mean of each row
df$mean <- rowMeans(df[, 1:4], na.rm=TRUE)
this returns
df
V1 V2 V3 V4 max min mean
1 NA NA 6 NA 7 5 6.00000
2 23 30 NA 17 30 16 23.33333
Upvotes: 1
Reputation: 11514
For instance, you could try the following, which works by melting your data first.
# getting your data:
df <- read.table(text="V1 V2 V3 V4 max min
1 3 6 8 7 5
23 30 5 17 30 16", header=T)
# melting the data:
library(reshape2)
df2 <- melt(df, id.vars = c("max", "min"))
df2
max min variable value
1 7 5 V1 1
2 30 16 V1 23
3 7 5 V2 3
4 30 16 V2 30
5 7 5 V3 6
6 30 16 V3 5
7 7 5 V4 8
8 30 16 V4 17
# I create a new vector with NAs, but you could easily just overwrite the values:
df2$val <- with(df2, ifelse(value > max | value < min, NA, value))
# Cast the data into the old form again.
df3 <- dcast(df2, max + min ~ variable, value.var = "val")
# calculate the rowMeans:
df3$mean <- rowMeans(df3[, 3:6], na.rm = TRUE)
# Doing some cosmetics here to get the same column ordering. Chose your preferred way or rearranging the columns, if required at all.
df3 <- df3[, c(paste0("V", 1:4),"max", "min", "mean") ]
df3
V1 V2 V3 V4 max min mean
1 NA NA 6 NA 7 5 6.00000
2 23 30 NA 17 30 16 23.33333
Note that the only difference is that the mean of the second row is lower. I am not sure how you got a value of 35 there.
Upvotes: 2
Reputation: 5456
Try:
df <- read.table(header=TRUE, text="V1 V2 V3 V4 max min
1 3 6 8 7 5
23 30 5 17 30 16")
df.new<-apply(df[,1:4],2,function(x) ifelse(x>df[,5] | x<df[,6],NA,x))
df.new<-cbind(df.new,df[,5:6])
df.new$mean=rowMeans(df.new[1:4],na.rm=TRUE)
df.new
Upvotes: 1