Reputation: 422
I want to create a function to take in a dataframe and a string assigned GENDER. The function will find the mean and sd of each variable in the df by GENDER and return a dataframe with all that info to a new df named "GENDERstats" that I could use in further analysis later on.
I can get everything I want to up until I name the new "GENDERstats" df, then it throws an error
Here's what I have so far, with dummy data
df <- data.frame(GENDER=c("M","F","M","F","M","F"),HELP=c(5,4,2,7,5,5),CARE=c(6,4,7,8,5,4),TRUST=c(6,5,3,6,8,6),SERVE=c(6,5,7,8,7,6))
my.func <- function(dat, bias){
datFrame <- data.frame()
for(i in 2:5){
d1 <- aggregate(dat[,i],by=list(dat[,bias]),FUN=mean,na.rm=TRUE)
d2 <- aggregate(dat[,i],by=list(dat[,bias]),FUN=sd,na.rm=TRUE)
d1$sd <- d2$x
d1$Var <- i
datFrame <- rbind(datFrame,d1)
}
# paste(bias,"stats") <- datFrame
}
I get the df I want in "datFrame", but I want to paste the bias variable and "stats" to make a new data frame. I will be doing this with several different "biases"
I want the new df to look like this:
Group.1 x sd Var
1 F 5.333333 1.5275252 2
2 M 4.000000 1.7320508 2
3 F 5.333333 2.3094011 3
4 M 6.000000 1.0000000 3
5 F 5.666667 0.5773503 4
6 M 5.666667 2.5166115 4
7 F 6.333333 1.5275252 5
8 M 6.666667 0.5773503 5
and from there I can plot graphs or only focus on means or sds
Upvotes: 0
Views: 33
Reputation:
I'm not quite sure how to fix your function (a couple details are missing), but you can get the same results without a user-defined function or for loop. The following iterates over combinations of GENDER
+ other variables, generate means and SDs with aggregate
, and then rbind
s the dataframes in do.call
:
do.call("rbind", lapply(2:ncol(df),
function(j) {
df_out <- aggregate(df[j], list(df$GENDER), "mean")
df_out[3] <-
aggregate(df[j], list(df$GENDER), "sd")[[2]]
df_out[4] <- j
`names<-`(df_out, c("gender", "x", "sd", "var"))
}))
#### OUTPUT ####
gender x sd var
1 F 5.33333 1.52753 2
2 M 4.00000 1.73205 2
3 F 5.33333 2.30940 3
4 M 6.00000 1.00000 3
5 F 5.66667 0.57735 4
6 M 5.66667 2.51661 4
7 F 6.33333 1.52753 5
8 M 6.66667 0.57735 5
I'm not sure if there isn't a slicker way of doing this in base R. Personally, I would go with dplyr's gather
+ group_by
+ summarise
, which is much cleaner and easier to understand. The output is pretty much the same as the above, just in a different order. The rounding only looks different because of how tibbles are printed:
library(dplyr)
library(tidyr)
df %>%
gather(var, val, -GENDER) %>%
group_by(GENDER, var) %>%
summarise(x = mean(val), sd = sd(val))
#### OUTPUT ####
# A tibble: 8 x 4
# Groups: GENDER [2]
GENDER var x sd
<chr> <chr> <dbl> <dbl>
1 F CARE 5.33 2.31
2 F HELP 5.33 1.53
3 F SERVE 6.33 1.53
4 F TRUST 5.67 0.577
5 M CARE 6 1
6 M HELP 4 1.73
7 M SERVE 6.67 0.577
8 M TRUST 5.67 2.52
Upvotes: 1