Reputation: 143
I have a data frame Mesure
and I wish to determine for each row, a weighted mean like this :
weighted_mean = ((mean_Mesure x nbr_Mesure) + (mean_DL x nbr_DL)) / (nbr_Mesure + nbr_DL)
I know there is a weighted.mean
function but I failed to obtain a new column "weighted_mean"
And, is it an issue if each row does not necessary have the 4 values to obtain this formula (such as row 6 in Mesure
) ?
> head(Mesure)
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL
2 Aquatic_moss.BE-7 123 4 542 12
3 Aquatic_moss.CO-57 100 7 117 14
4 Aquatic_moss.CO-58 120 5 145 12
5 Aquatic_moss.CO-60 140 5 153 12
6 Aquatic_moss.CS-134 146 15
Upvotes: 0
Views: 132
Reputation: 825
You can use the rowwise()
function in the new dplyr
:
library(dplyr) # 1.0.0
Mesure %>%
rowwise() %>%
mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
# A tibble: 5 x 6
# Rowwise:
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aquatic_moss.BE-7 123 4 542 12 437.
2 Aquatic_moss.CO-57 100 7 117 14 111.
3 Aquatic_moss.CO-58 120 5 145 12 138.
4 Aquatic_moss.CO-60 140 5 153 12 149.
5 Aquatic_moss.CS-134 NA NA 146 15 NA
EDIT
If we want to replace NA
s with 0, then we can use the na_replace()
function from tidyr
:
library(dplyr)
library(tidyr) # 1.1.0
Mesure %>%
replace_na(list(mean_Mesure = 0,
nbr_Mesure = 0,
mean_DL = 0,
nbr_DL = 0)) %>%
rowwise() %>%
mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
# A tibble: 5 x 6
# Rowwise:
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aquatic_moss.BE-7 123 4 542 12 437.
2 Aquatic_moss.CO-57 100 7 117 14 111.
3 Aquatic_moss.CO-58 120 5 145 12 138.
4 Aquatic_moss.CO-60 140 5 153 12 149.
5 Aquatic_moss.CS-134 0 0 146 15 146
DATA
Mesure <- structure(list(Row.names = c("Aquatic_moss.BE-7", "Aquatic_moss.CO-57",
"Aquatic_moss.CO-58", "Aquatic_moss.CO-60", "Aquatic_moss.CS-134"
), mean_Mesure = c(123, 100, 120, 140, NA), nbr_Mesure = c(4,
7, 5, 5, NA), mean_DL = c(542, 117, 145, 153, 146), nbr_DL = c(12,
14, 12, 12, 15)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Reputation: 4233
You can also use mapply
. That way you can use a generic function and pass any columns to it:
df <- read.table(text = "
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL
2 Aquatic_moss.BE-7 123 4 542 12
3 Aquatic_moss.CO-57 100 7 117 14
4 Aquatic_moss.CO-58 120 5 145 12
5 Aquatic_moss.CO-60 140 5 153 12
6 Aquatic_moss.CS-134 NA NA 146 15 ")
df$mean_Mesure[is.na(df$mean_Mesure)] <- 0
df$nbr_Mesure[is.na(df$nbr_Mesure)] <- 0
df$weighted.mean <- mapply(function(x1,x2,x3,x4) (x1*x2 + x3*x4)/(x2+x4), df$mean_Mesure, df$nbr_Mesure, df$mean_DL, df$nbr_DL)
Output
Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
2 Aquatic_moss.BE-7 123 4 542 12 437.2500
3 Aquatic_moss.CO-57 100 7 117 14 111.3333
4 Aquatic_moss.CO-58 120 5 145 12 137.6471
5 Aquatic_moss.CO-60 140 5 153 12 149.1765
6 Aquatic_moss.CS-134 0 0 146 15 146.0000
Upvotes: 1
Reputation: 39737
In your case you can use your equation like it is to get weighted means per row like:
with(Mesure, ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
#[1] 437.2500 111.3333 137.6471 149.1765 NA
When there are missing values it will return NA
. In case NA
is 0
you can set it to 0
:
Mesure[is.na(Mesure)] <- 0
what gives:
#[1] 437.2500 111.3333 137.6471 149.1765 146.0000
Upvotes: 1