sjbka
sjbka

Reputation: 45

R: selecting row values based on row range

I have a data frame (df) with 4 columns of values (V1 to V4 columns) that I need to select based on two other columns (max and min columns). My aim is to assign NAs to those values outside of the range set by the max and min columns for each row and calculate the mean of the remaining values.

V1   V2   V3   V4   max  min
 1    3    6    8     7    5
23   30    5   17    30   16 

The expected output would be:

V1   V2   V3   V4   max  min  mean
NA   NA    6   NA     7    5     6
23   30   NA   17    30   16    35

So far, I can only do this by using the following script to assign NAs...

df$V1 <- ifelse(df$V1 > df$max | df$V1 < df$min, NA, df$V1)

df$V2 <- ifelse(df$V2 > df$max | df$V2 < df$min, NA, df$V2)

df$V3 <- ifelse(df$V3 > df$max | df$V3 < df$min, NA, df$V3)

df$V4 <- ifelse(df$V4 > df$max | df$V4 < df$min, NA, df$V4)

...and then the following to calculate the mean:

df$mean <- rowMeans(df[, 1:4], na.rm = TRUE)

The problem is that the number of columns in the real data will be much larger than 4 and this method seems to require far too much repetition. Is there a better way of doing this in R?

I have tried using data.table to subset the valid values to then use the apply function without success:

df <- df[df[,1:4] <= df$max | df[,1:4] >= df$min, ]

apply(df[,1:4], 1, function(x) mean(x))

Thank you.

Upvotes: 2

Views: 1998

Answers (3)

lmo
lmo

Reputation: 38520

Here is a simple solution with a for loop to fill in the NAs and rowMeans to calculate the mean of each row.

# loop through rows and fill in NA for values outside of min/max
for(i in 1:nrow(df))
    is.na(df[i, 1:4]) <- df[i, 1:4] < df[i, "min"] | df[i, 1:4] > df[i, "max"]

# calculate mean of each row
df$mean <- rowMeans(df[, 1:4], na.rm=TRUE)

this returns

df
  V1 V2 V3 V4 max min     mean
1 NA NA  6 NA   7   5  6.00000
2 23 30 NA 17  30  16 23.33333

Upvotes: 1

coffeinjunky
coffeinjunky

Reputation: 11514

For instance, you could try the following, which works by melting your data first.

# getting your data:
df <- read.table(text="V1   V2   V3   V4   max  min
                        1    3    6    8     7    5
                       23   30    5   17    30   16", header=T)

# melting the data:
library(reshape2)
df2 <- melt(df, id.vars = c("max", "min"))
df2
max min variable value
1   7   5       V1     1
2  30  16       V1    23
3   7   5       V2     3
4  30  16       V2    30
5   7   5       V3     6
6  30  16       V3     5
7   7   5       V4     8
8  30  16       V4    17

# I create a new vector with NAs, but you could easily just overwrite the values:
df2$val <- with(df2, ifelse(value > max | value < min, NA, value))

# Cast the data into the old form again.
df3 <- dcast(df2, max + min ~ variable, value.var = "val")
# calculate the rowMeans:
df3$mean <- rowMeans(df3[, 3:6], na.rm = TRUE)

# Doing some cosmetics here to get the same column ordering. Chose your preferred way or rearranging the columns, if required at all.
df3 <- df3[, c(paste0("V", 1:4),"max", "min", "mean") ]
df3
  V1 V2 V3 V4 max min     mean
1 NA NA  6 NA   7   5  6.00000
2 23 30 NA 17  30  16 23.33333

Note that the only difference is that the mean of the second row is lower. I am not sure how you got a value of 35 there.

Upvotes: 2

r.user.05apr
r.user.05apr

Reputation: 5456

Try:

df <- read.table(header=TRUE, text="V1   V2   V3   V4   max  min
 1    3    6    8     7    5
23   30    5   17    30   16")

df.new<-apply(df[,1:4],2,function(x) ifelse(x>df[,5] | x<df[,6],NA,x))
df.new<-cbind(df.new,df[,5:6])
df.new$mean=rowMeans(df.new[1:4],na.rm=TRUE)
df.new

Upvotes: 1

Related Questions