n23
n23

Reputation: 99

Find mean of multiple columns in R

My dataset looks like this:

values<-c(9,8,NA,8)
acceptance<-c(8,8,NA,6)
diffusion<-c(9,8,7,NA)
attitudes<-c(7,7,6,NA)
df<-data.frame(values,acceptance,diffusion,attitudes)

values  acceptance  diffusion  attitudes

  9         8          6           8 
  8         8          8           7
  NA        NA         7           6
  8         6          NA          NA

I can use the code mean(df$values, na.rm = T) to get the mean for each column, but I'd like to create a new variable for the total mean of specific columns (ex: values & acceptance). I could just use the same code for each column I want to include and then figure out the mean that way:

mean(df$values, na.rm = T) = 8.333
mean(df$acceptance, na.rm = T) = 7.333
(8.333 + 7.333)/2 = 7.833
df$values_acceptance<-7.833

But this would be really inefficient because I have multiple variables I need to include. I'm sure there is a much easier way to do this, but I'm still getting used to R.

Thanks in advance!

Upvotes: 4

Views: 4726

Answers (4)

Spooked
Spooked

Reputation: 596

You could use mapply

df2 <- mapply(mean,df,na.rm=T)


Which produces

   values acceptance  diffusion  attitudes 
  8.333333   7.333333   8.000000   6.666667 

Or if you want to mean of each column mean you could go

mean(mapply(mean,df,na.rm=T))

[1] 7.583333

If you need to specify specific columns you can do the following

mean(mapply(mean,df[c(1:2)],na.rm=T))

[1] 7.833333

and use na.rm twice if one of the columns means will have a NA

mean(mapply(mean,df[c(1:2)],na.rm=T),na.rm=T)

Upvotes: 0

wutao
wutao

Reputation: 94

Just use c() to combine the columns you want calculate the total mean:

df %>% mutate(new=mean(c(values,acceptance),na.rm = T))
  values acceptance diffusion attitudes      new
1      9          8         9         7 7.833333
2      8          8         8         7 7.833333
3     NA         NA         7         6 7.833333
4      8          6        NA        NA 7.833333

Upvotes: 1

Kra.P
Kra.P

Reputation: 15123

You may approach this way

df %>%
  select(values, acceptance) %>%
  reshape2::melt() %>%
  summarise(n = mean(value, na.rm = TRUE))

result is like

         n
1 7.833333

Upvotes: 0

akrun
akrun

Reputation: 886938

We can use colMeans on the selected columns and get the mean of it, then assign the output to create new column (no packages are needed)

df$values_acceptance<- mean(colMeans(df[c('values', 'acceptance')], na.rm = TRUE))

-output

> df
  values acceptance diffusion attitudes values_acceptance
1      9          8         9         7          7.833333
2      8          8         8         7          7.833333
3     NA         NA         7         6          7.833333
4      8          6        NA        NA          7.833333

Or if we need dplyr

library(dplyr)
df %>%
    mutate(values_acceptance = mean(unlist(across(c(values,
         acceptance), mean, na.rm = TRUE))))

-output

values acceptance diffusion attitudes values_acceptance
1      9          8         9         7          7.833333
2      8          8         8         7          7.833333
3     NA         NA         7         6          7.833333
4      8          6        NA        NA          7.833333

Upvotes: 2

Related Questions