Kryo
Kryo

Reputation: 933

how to compute average of selected columns

I want to take the rowMean of columns based on these criteria a) row mean of columns with value range : > 0.1 & < 0.9 b) row mean of columns with value range : > 0.9

Input dataframe
    > df1[35:68,10:13]
            X3322_1       X3322_2         X3322_3         X3322_4     X3322_5
           1.119000      0.1020200       1.183000       1.093800      1.2522000
           1.019500     -0.2394300       3.656900      -0.187350      3.6569000
           2.053900      0.0659420       0.694840       0.481820      1.3587000

expected output

> res   
              A            B
           0.1020200     1.162
             0           2.777
            0.612        1.7063

Upvotes: 1

Views: 338

Answers (2)

lmo
lmo

Reputation: 38500

Here is another base R solution. It may be a bit slow on super large datasets, but will work out pretty well on medium sized problems. I created a new data.frame to use as one was not available:

# create 10X10 data.frame, values in N(1,1) distribution
set.seed(1234)
df <- data.frame(matrix(rnorm(100)+1,10))
names(df) <- letters[1:10]

# get averages based on first criterion: : > 0.1 & < 0.9
apply(df[5:8, 2:8], 1, function(i) mean(ifelse(i > 0.1 | i < 0.9, i, NA), na.rm=T))
apply(df[5:8, 2:8], 1, function(i) mean(ifelse(i >= 0.9, i, NA), na.rm=T))

To combine these you could use rbind.

Upvotes: 1

Mike H.
Mike H.

Reputation: 14360

One way to do it would be to add row indices and then melt your dataframe. Since you didn't provide a dput I'm just using part of your data. I'm sure someone can come up with a faster/simpler way to do this, but one way would be:

library(reshape2)


a <- c(1.119, 1.0195, 2.0539)
b <- c(0.10202, -0.23943, 0.0659)
c <- c(1.183, 3.6569, 0.69840)

df <- data.frame(a=a, b=b,c=c)


df$row <- 1:nrow(df)
df_m <- melt(df,c("row"))
df_m$val_1_9 <- ifelse(df_m$value > 0.1 & df_m$value < 0.9, df_m$value, NA)
df_m$val_gt_9 <- ifelse(df_m$value > 0.9, df_m$value, NA)


res <- aggregate(df_m[, c("val_1_9","val_gt_9")], list(df_m$row), mean,na.rm=TRUE)

res
  Group.1 val_1_9 val_gt_9
1       1 0.10202   1.1510
2       2     NaN   2.3382
3       3 0.69840   2.0539

Upvotes: 2

Related Questions