Reputation: 3941
Given a sample data frame:
C1<-c(3,2,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,3)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)
DF
ID C1 C2 C3
1 A 3 3 5
2 B 2 7 4
3 C 4 3 3
4 D 4 4 6
5 E 5 5 3
What is the best way to create a second data frame that would contain the ID
column and the mean of each row? Something like this:
ID Mean
A 3.66
B 4.33
C 3.33
D 4.66
E 4.33
Something similar to:
RM<-rowMeans(DF[,2:4])
I'd like to keep the means aligned with their ID
's.
Upvotes: 77
Views: 207914
Reputation: 26238
rowwise()
in dplyr
can be used in such situations
library(dplyr)
#>
DF %>%
rowwise() %>%
summarise(ID,
Mean = mean(c_across(C1:C3)))
#> # A tibble: 5 × 2
#> ID Mean
#> <chr> <dbl>
#> 1 A 3.67
#> 2 B 4.33
#> 3 C 3.33
#> 4 D 4.67
#> 5 E 4.33
Still, if you want to use rowMeans
that can also be used in piped syntax
DF %>%
mutate(Mean = rowMeans(.[-1]))
#> ID C1 C2 C3 Mean
#> 1 A 3 3 5 3.666667
#> 2 B 2 7 4 4.333333
#> 3 C 4 3 3 3.333333
#> 4 D 4 4 6 4.666667
#> 5 E 5 5 3 4.333333
.
is actually a special argument which passes the result of previous piped syntax to next pipe operation.
Upvotes: 1
Reputation: 2141
Awnser adapted from: here for N different groups of columns
library(dplyr, warn.conflicts = FALSE)
library(purrr)
row_means <- DF %>%
dplyr::select(where(is.numeric)) %>%
split.default(stringr::str_remove(names(df), '[0-9]')) %>%
map(rowMeans) %>%
setNames(paste0("mean_", names(.)))
DF %>%
mutate(
!!!row_means
)
Upvotes: 1
Reputation: 56229
Using dplyr:
library(dplyr)
DF %>%
transmute(ID,
Mean = rowMeans(across(C1:C3)))
Or
DF %>%
transmute(ID,
Mean = rowMeans(select(., C1:C3)))
# ID Mean
# 1 A 3.666667
# 2 B 4.333333
# 3 C 3.333333
# 4 D 4.666667
# 5 E 4.333333
Upvotes: 16
Reputation: 558
rowMeans
is nice, but if you are still trying to wrap your head around the apply
family of functions, this is a good opprotunity to begin understanding it.
DF['Mean'] <- apply(DF[,2:4], 1, mean)
Notice I'm doing a slightly different assignment than the first example. This approach makes it easier to incorporate it into for loops.
Upvotes: 3
Reputation:
(Another solution using pivot_longer
& pivot_wider
from latest Tidyr
update)
You should try using pivot_longer to get your data from wide to long form Read latest tidyR update on pivot_longer & pivot_wider (https://tidyr.tidyverse.org/articles/pivot.html)
library(tidyverse)
C1<-c(3,2,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,3)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)
Output here
ID mean
<fct> <dbl>
1 A 3.67
2 B 4.33
3 C 3.33
4 D 4.67
5 E 4.33
Upvotes: 0
Reputation: 61214
Calculate row means on a subset of columns:
Create a new data.frame which specifies the first column from DF as an column called ID and calculates the mean of all the other fields on that row, and puts that into column entitled 'Means':
data.frame(ID=DF[,1], Means=rowMeans(DF[,-1]))
ID Means
1 A 3.666667
2 B 4.333333
3 C 3.333333
4 D 4.666667
5 E 4.333333
Upvotes: 66
Reputation: 251
You can create a new row with $
in your data frame corresponding to the Means
DF$Mean <- rowMeans(DF[,2:4])
Upvotes: 25
Reputation: 19454
Starting with your data frame DF
, you could use the data.table
package:
library(data.table)
## EDIT: As suggested by @MichaelChirico, setDT converts a
## data.frame to a data.table by reference and is preferred
## if you don't mind losing the data.frame
setDT(DF)
# EDIT: To get the column name 'Mean':
DF[, .(Mean = rowMeans(.SD)), by = ID]
# ID Mean
# [1,] A 3.666667
# [2,] B 4.333333
# [3,] C 3.333333
# [4,] D 4.666667
# [5,] E 4.333333
Upvotes: 33