Tpellirn
Tpellirn

Reputation: 796

How to replace values for similar values by mean for all columns?

I have this data frame here

 df = structure(list(D = c(-76, -74, -72, -70, -44, -42), A = c(83, 
 83, 82, 82, 81, 81), B = c(-0.613, -0.4,-0.5, -0.68, -0.13, -0.26)), row.names = 
c(NA, 6L), class = "data.frame")

I would like to compute the mean of all values in B that have similar value in A.

for instance -0.613 and -0.4 as they correspond to the same values 83 etc

I can simply do this :

   df$Bmean <- with(df, ave(B, A))

However this only for B. I need to do the same thing for all columns (B,D,etc.) in df

Upvotes: 1

Views: 169

Answers (2)

Duck
Duck

Reputation: 39613

You could use this approach

library(dplyr)
#Approach 1
df %>% group_by(A) %>% mutate_all(mean,na.rm=T)

# A tibble: 6 x 3
# Groups:   A [3]
      D     A      B
  <dbl> <dbl>  <dbl>
1   -75    83 -0.506
2   -75    83 -0.506
3   -71    82 -0.59 
4   -71    82 -0.59 
5   -43    81 -0.195
6   -43    81 -0.195

#Approach 2
df %>% group_by(A) %>% summarise_all(mean,na.rm=T)

# A tibble: 3 x 3
      A     D      B
  <dbl> <dbl>  <dbl>
1    81   -43 -0.195
2    82   -71 -0.59 
3    83   -75 -0.506

Upvotes: 1

akrun
akrun

Reputation: 887951

We can use mutate with across from dplyr for multiple columns

library(dplyr) # 1.0.0
df %>% 
   group_by(A) %>%
   mutate(across(everything(), list(mean = ~ mean(.))))

If it is to replace original column with mean

df %>%
   group_by(A) %>%
   mutate(across(everything(), mean, na.rm = TRUE))

NOTE: na.rm = TRUE is added in case there are any NA values as by default it is na.rm = FALSE


Or to have fine control over the column names

df1 <- df %>% 
         group_by(A) %>%
         mutate(across(everything(), list(mean = ~ mean(.)), .names = "{col}mean"))
df1
# A tibble: 6 x 5
# Groups:   A [3]
#      D     A      B Dmean  Bmean
#  <dbl> <dbl>  <dbl> <dbl>  <dbl>
#1   -76    83 -0.613   -75 -0.506
#2   -74    83 -0.4     -75 -0.506
#3   -72    82 -0.5     -71 -0.59 
#4   -70    82 -0.68    -71 -0.59 
#5   -44    81 -0.13    -43 -0.195
#6   -42    81 -0.26    -43 -0.195

Or using ave for multiple columns, get the vector of column names that are not the grouping ("A" with setdiff ('nm1'), Loop over the vector, subset the dataset column, use that in ave and assign it back to the dataset as new columns with paste

nm1 <- setdiff(names(df), "A")
df[paste0(nm1, "mean")] <- lapply(nm1, function(nm)  ave(df[[nm]], df$A))

Upvotes: 1

Related Questions