Gianni D'Adova
Gianni D'Adova

Reputation: 39

How to calculate means per each row in a dataframe? [R]

Here is the df:

    # A tibble: 6 x 5
      t      a      b       c       d
  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 3999. 0.00586 0.00986 0.00728 0.00856
2 3998. 0.0057  0.00958 0.00702 0.00827
3 3997. 0.00580 0.00962 0.00711 0.00839
4 3996. 0.00602 0.00993 0.00726 0.00875

I want to get means for an all rows except to not include the first column. The code I wrote:

df$Mean <- rowMeans(df[select(df, -"t")])

The error I get:

    Error: Must subset columns with a valid subscript vector.
x Subscript `select(group1, -"t")` has the wrong type `tbl_df<
  p2 : double
  p8 : double
  p10: double
  p9 : double
>`.
ℹ It must be logical, numeric, or character.

I tried to convert df to matrix, but then I get another error. How should I solve this?

Now I'm trying to calculate standard error using the code:

se <- function(x){sd(df[,x])/sqrt(length(df[,x]))}
sapply(group1[,2:5],se)

I try to indicate which columns should be used to calculate the error, but again an error pops up:

 Error: Must subset columns with a valid subscript vector.
x Can't convert from `x` <double> to <integer> due to loss of precision.

I have used valid column subscripts, so I don't know why the error.

Upvotes: 1

Views: 578

Answers (2)

akrun
akrun

Reputation: 886938

We can use setdiff to return the columns that are not 't' and then get the rowMeans. This assumes that the column 't' can be anywhere and not based on the position of the column

df$Mean <- rowMeans(df[setdiff(names(df), "t")], na.rm = TRUE)
df
#     t       a       b       c       d      Mean
#1 3999 0.00586 0.00986 0.00728 0.00856 0.0078900
#2 3998 0.00570 0.00958 0.00702 0.00827 0.0076425
#3 3997 0.00580 0.00962 0.00711 0.00839 0.0077300
#4 3996 0.00602 0.00993 0.00726 0.00875 0.0079900

select from dplyr returns the subset of data.frame and not the column names or index. So, we can directly apply rowMeans

library(dplyr)
rowMeans(select(df, -t), na.rm = TRUE)

Or in a pipe

df <- df %>%
         mutate(Mean = rowMeans(select(., -t), na.rm = TRUE))

Update

If we need to get the standard error per row, we can use apply with MARGIN as 1

apply(df[setdiff(names(df), 't')], 1, 
            function(x) sd(x)/sqrt(length(x)))

Or with rowSds from matrixStats

library(matrixStats)
rowSds(as.matrix(df[setdiff(names(df), 't')]))/sqrt(ncol(df)-1)

data

df <- structure(list(t = c(3999, 3998, 3997, 3996), a = c(0.00586, 
0.0057, 0.0058, 0.00602), b = c(0.00986, 0.00958, 0.00962, 0.00993
), c = c(0.00728, 0.00702, 0.00711, 0.00726), d = c(0.00856, 
0.00827, 0.00839, 0.00875)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

Upvotes: 0

Duck
Duck

Reputation: 39585

A similar base R solution would be:

df$Mean <- rowMeans(df[,-1],na.rm=T)

Output:

     t       a       b       c       d      Mean
1 3999 0.00586 0.00986 0.00728 0.00856 0.0078900
2 3998 0.00570 0.00958 0.00702 0.00827 0.0076425
3 3997 0.00580 0.00962 0.00711 0.00839 0.0077300
4 3996 0.00602 0.00993 0.00726 0.00875 0.0079900

Upvotes: 1

Related Questions