Reputation: 25
I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.
groupList <- list(".(A, B)", "A", "B")
for(i in 1:length(groupList)){
output <- dt[,list(mean=mean(C),
sd=sd(C),
min=min(C),
median=median(C),
max=max(C)),
by = groupList[i]]
Here insert code to save each output
}
Upvotes: 1
Views: 735
Reputation: 93813
Your groupList
can be restructured as a list of character vectors. Then you can either use lapply
or the existing for
loop with an added eval()
to interpret the by=
input properly:
set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)
groupList <- list(c("A", "B"), c("A"), c("B"))
lapply(
groupList,
function(x) {
dt[, .(mean=mean(C), sd=sd(C)), by=x]
}
)
out <- vector("list", 3)
for(i in 1:length(groupList)){
out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}
str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# ..$ A : int [1:6] 1 1 1 2 2 2
# ..$ B : int [1:6] 1 2 3 3 4 5
# ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
# ..$ sd : num [1:6] 0.707 0.707 NA NA 0.707 ...
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# ..$ A : int [1:2] 1 2
# ..$ mean: num [1:2] 3 8
# ..$ sd : num [1:2] 1.58 1.58
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
# ..$ B : int [1:5] 1 2 3 4 5
# ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
# ..$ sd : num [1:5] 0.707 0.707 0.707 0.707 0.707
Upvotes: 0
Reputation: 3619
For demonstration, I used the mtcars
data set. Here is one way with the dplyr
package.
library(dplyr)
# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")
# group by the variable gear
mtcars %>%
group_by(gear) %>%
summarise_at(vars(mpg), describe)
# group by the variable carb
mtcars %>%
group_by(carb) %>%
summarise_at(vars(mpg), describe)
# group by both gear and carb
mtcars %>%
group_by(gear, carb) %>%
summarise_at(vars(mpg), describe)
Upvotes: 0
Reputation: 53
I guess aggregate
function can solve your problem. Let us say you have a dataframe df
contains three columns A
,B
,C
,given as:
df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)
If you want calculate mean of C
by factor A
, try:
aggregate(formula=C~A,data=df,FUN=mean)
by factor B
, try:
aggregate(formula=C~B,data=df,FUN=mean)
by factor A
and B
, try:
aggregate(formula=C~A+B,data=df,FUN=mean)
Upvotes: 2