Reputation: 1823

Create a function for mean calculation using an specific rule

I need to create a function for mean calculation using an specific rule without the use of apply or aggregate functions. I have 3 variables and I would like to calculate the mean of var3 each change in var2 first and second the var 3 mean each change in the var1 in the same function. This is possible? My code is:

Variable 1

var1 <- sort(rep(LETTERS[1:3],10))

Variable 2

var2 <- rep(1:5,6)

Variable 3

var3 <- rnorm(30)

Create data frame

DB<-NULL
DB<-cbind(var1,var2,as.numeric(var3))
head(DB)

Function for calculate the mean follow a rule

mymean <- function(x, db=DB){

for (1:length(db[,1])){

if (db[,[i]] != db[,[i]]) {
mean(db[,[i]])
}
else (db[,[i]] == db[,[i]]) {
stop("invalid rule") 
}}

Here start the problems and doesn't work

Thanks Alexandre

Upvotes: 0

Answers (1)

Jacob H

Reputation: 4513

It appears that you want to obtain means by groups.

To do this I would use the dplyr package

library(dplyr)

db <- data.frame(var1 = sort(rep(LETTERS[1:3],10)), var2=rep(1:5,6), var3=rnorm(30))
db %>%
group_by(var1) %>%
summarise(mean_over_va1 = mean(var3))
  var1 mean_over_va1
1    A    0.07314416
2    B   -0.05983557
3    C   -0.03592565

db %>%
group_by(var2) %>%
summarise(mean_over_va2 = mean(var3))

  var2 mean_over_va2
 1    1 -0.4512942044
 2    2 -0.1331316802
 3    3  0.0821958902
 4    4 -0.0001081054
 5    5  0.4646429921

From you comments however, it appears that you don't want to use any base R commands like apply and aggregate so I assume you may not like the above solution.

If I had to do this with brute force do something like this:

db <- data.frame(var1 = sort(rep(LETTERS[1:3],10)), var2=rep(1:5,6), var3=rnorm(30), stringsAsFactors = FALSE)

#Obtaining Groups
group1 <- unique(db$var1)
group2 <- unique(db$var2)

#Obtaining Number of Different types of groups so I dont have to keep calling length
N1 <- length(group1)
N2 <- length(group2)

#Preallocating, not necessary but a good habit
res1 <- data.frame(group = group1, mean = rep(NA, N1))
res2 <- data.frame(group = group2, mean = rep(NA, N2))


#Looping over the group members rather than each row of data.  I like this approach because it relies more heavily on sub-setting than it does on iteration, which is always a good idea in R.
for (i in seq(1, N1)){
  res1[i,"mean"] <- mean(db[db$var1%in%group1[i], "var3"])
}

for (i in seq(1, N2)){
  res2[i,"mean"] <- mean(db[db$var2%in%group2[i], "var3"])
}

res <- list(res1, res2)

Upvotes: 1