Jacob
Jacob

Reputation: 422

Can I combine two strings into one, and use the combined name to assign a data frame to?

I want to create a function to take in a dataframe and a string assigned GENDER. The function will find the mean and sd of each variable in the df by GENDER and return a dataframe with all that info to a new df named "GENDERstats" that I could use in further analysis later on.

I can get everything I want to up until I name the new "GENDERstats" df, then it throws an error

Here's what I have so far, with dummy data

df <- data.frame(GENDER=c("M","F","M","F","M","F"),HELP=c(5,4,2,7,5,5),CARE=c(6,4,7,8,5,4),TRUST=c(6,5,3,6,8,6),SERVE=c(6,5,7,8,7,6))

my.func <- function(dat, bias){
datFrame <- data.frame()
  for(i in 2:5){
    d1 <- aggregate(dat[,i],by=list(dat[,bias]),FUN=mean,na.rm=TRUE)
    d2 <- aggregate(dat[,i],by=list(dat[,bias]),FUN=sd,na.rm=TRUE)
    d1$sd <- d2$x
    d1$Var <- i
    datFrame <- rbind(datFrame,d1)
  }
    # paste(bias,"stats") <- datFrame
}


I get the df I want in "datFrame", but I want to paste the bias variable and "stats" to make a new data frame. I will be doing this with several different "biases"

I want the new df to look like this:

  Group.1        x        sd Var
1       F 5.333333 1.5275252   2
2       M 4.000000 1.7320508   2
3       F 5.333333 2.3094011   3
4       M 6.000000 1.0000000   3
5       F 5.666667 0.5773503   4
6       M 5.666667 2.5166115   4
7       F 6.333333 1.5275252   5
8       M 6.666667 0.5773503   5

and from there I can plot graphs or only focus on means or sds

Upvotes: 0

Views: 33

Answers (1)

user10191355
user10191355

Reputation:

I'm not quite sure how to fix your function (a couple details are missing), but you can get the same results without a user-defined function or for loop. The following iterates over combinations of GENDER + other variables, generate means and SDs with aggregate, and then rbinds the dataframes in do.call:

do.call("rbind", lapply(2:ncol(df),
                        function(j) {
                            df_out <- aggregate(df[j], list(df$GENDER), "mean")
                            df_out[3] <-
                                aggregate(df[j], list(df$GENDER), "sd")[[2]]
                            df_out[4] <- j
                            `names<-`(df_out, c("gender", "x", "sd", "var"))
                        }))


#### OUTPUT ####

  gender       x      sd var
1      F 5.33333 1.52753   2
2      M 4.00000 1.73205   2
3      F 5.33333 2.30940   3
4      M 6.00000 1.00000   3
5      F 5.66667 0.57735   4
6      M 5.66667 2.51661   4
7      F 6.33333 1.52753   5
8      M 6.66667 0.57735   5

I'm not sure if there isn't a slicker way of doing this in base R. Personally, I would go with dplyr's gather + group_by + summarise, which is much cleaner and easier to understand. The output is pretty much the same as the above, just in a different order. The rounding only looks different because of how tibbles are printed:

library(dplyr)  
library(tidyr)  

df %>% 
    gather(var, val, -GENDER) %>% 
    group_by(GENDER, var) %>% 
    summarise(x = mean(val), sd = sd(val))

#### OUTPUT ####

# A tibble: 8 x 4
# Groups:   GENDER [2]
  GENDER var       x    sd
  <chr>  <chr> <dbl> <dbl>
1 F      CARE   5.33 2.31 
2 F      HELP   5.33 1.53 
3 F      SERVE  6.33 1.53 
4 F      TRUST  5.67 0.577
5 M      CARE   6    1    
6 M      HELP   4    1.73 
7 M      SERVE  6.67 0.577
8 M      TRUST  5.67 2.52 

Upvotes: 1

Related Questions