Reputation: 99
I have a data frame with 68 columns of variables.
> dim(full_data)
[1] 10299 68
Example:
F1 F2 M1 M2 M3 ... M66
1 A 3 5 8 1
1 B 4 1 2 5
1 A 9 8 7 7
I need to average all the columns M1 to M66 by grouping on F1 and F2.
Most methods seem to be something like this: ddply(full_data,c("F1","F2"),summarise,MEAN=mean(M1)) --where a new row: MEAN is specified and created. I don't want to do that for 66 columns. I would prefer the column names to just stay the same.
Example Result:
F1 F2 M1 M2 M3 ... M66
1 A 6 6.5 7.5 4
1 B 4 1 2 5
Upvotes: 1
Views: 300
Reputation: 92282
Assuming your data set called df
## install.packages("data.table")
library(data.table)
setDT(df)[, lapply(.SD, mean), by = list(F1, F2)]
## F1 F2 M1 M2 M3 M66
## 1: 1 A 6 6.5 7.5 4
## 2: 1 B 4 1.0 2.0 5
If you also have some other columns in your data set and you want to include only M1:M66, you could use .SDcols
too
setDT(df)[, lapply(.SD, mean), .SDcols = paste0("M", seq_len(66)), by = list(F1, F2)]
Or you could use dplyr
too
library(dplyr)
df %>%
group_by(F1, F2) %>%
summarise_each(funs(mean))
## Source: local data frame [2 x 6]
## Groups: F1
## F1 F2 M1 M2 M3 M66
## 1: 1 A 6 6.5 7.5 4
## 2: 1 B 4 1.0 2.0 5
Here's a base R solution which I suspect will be more efficient than aggregate
or ddply
t(vapply(split(df[, -c(1:2)], df[, 1:2], drop = TRUE), colMeans, double(4))) # In your case it will be double(66)
## M1 M2 M3 M66
## 1.A 6 6.5 7.5 4
## 1.B 4 1.0 2.0 5
Upvotes: 4
Reputation: 886938
Or using base R
aggregate(.~F1+F2, df, mean)
# F1 F2 M1 M2 M3 M66
#1 1 A 6 6.5 7.5 4
#2 1 B 4 1.0 2.0 5
Using ddply
, you can do colwise
library(plyr)
ddply(df, .(F1, F2), numcolwise(mean))
# F1 F2 M1 M2 M3 M66
#1 1 A 6 6.5 7.5 4
#2 1 B 4 1.0 2.0 5
df <- structure(list(F1 = c(1L, 1L, 1L), F2 = c("A", "B", "A"), M1 = c(3L,
4L, 9L), M2 = c(5L, 1L, 8L), M3 = c(8L, 2L, 7L), M66 = c(1L,
5L, 7L)), .Names = c("F1", "F2", "M1", "M2", "M3", "M66"), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Reputation: 70613
Using base functions, you could do
mydf <- data.frame(F1 = sample(c("a", "b", "c"), 100, replace = TRUE),
F2 = sample(c("1", "2"), 100, replace = TRUE),
M1 = runif(100),
M2 = runif(100),
M3 = runif(100))
aggregate(. ~ F1 + F2, FUN = mean, data = mydf)
F1 F2 M1 M2 M3
1 a 1 0.5787761 0.5044229 0.4641159
2 b 1 0.5427231 0.4923563 0.5289595
3 c 1 0.5145906 0.5709069 0.4812297
4 a 2 0.4161674 0.4815931 0.5127524
5 b 2 0.5018423 0.4337168 0.5563098
6 c 2 0.4326560 0.4749937 0.4575443
This will use all non F1 and F2 columns to average. You could construct a formula to include only specific M* columns or you can do a subset of a data.frame using for example grepl
.
Upvotes: 0