Reputation: 1101
I calculate an index using dplyr. The index is the summation of the squared ratios between each entry and the total entry in a group.
library(dplyr)
set.seed(1e2)
firm_id <- sample(1:3, 1e2, rep=T)
pro_id <- sample(1:8, 1e2, rep=T)
emplo_id <- sample(1:5, 1e2, rep=T)
cost <- round(abs(rnorm(1e2, 20)), 2)
df <- data.frame(firm_id, pro_id, emplo_id, cost)
df_index <- df %>% group_by(firm_id,pro_id) %>%
mutate(INDEX = sum((cost/sum(cost))^2))
I want now to calculate how much each entry contributes to the idex its group produces, meaning that I want to calculate a new index as if the entry cost for a value were 0, and this for every entry as if in a loop (then divide the new index by the old).
Expected results:
firm_id <- c(1,1,1)
pro_id <- c(1,1,1)
emplo_id <- c(1:3)
cost <- c(1,50,100)
INDEX <- rep(0.5482654,3)
newINDEX <- c(0.5555556,0.9803941,0.9615532)
df_index <- data.frame(firm_id, pro_id, emplo_id, cost, INDEX, newINDEX)
With mutate I have no idea how to do it. Any suggestion welcome!
Upvotes: 2
Views: 1291
Reputation: 11878
You can use purrr::map_dbl()
to loop over the row indices within each
group, and then apply a function that replaces the cost
at a given index
with 0 and then recalculates the index. Here's an example with the data that
you gave the expected output for:
library(dplyr)
library(purrr)
# The function used to calculate the index value
index <- function(x) sum((x / sum(x)) ^ 2)
df_index %>%
group_by(firm_id, pro_id) %>%
mutate(new = map_dbl(row_number(), function(i) {
index(replace(cost, i, 0))
}))
#> # A tibble: 3 x 7
#> # Groups: firm_id, pro_id [1]
#> firm_id pro_id emplo_id cost INDEX newINDEX new
#> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1 0.548 0.556 0.556
#> 2 1 1 2 50 0.548 0.980 0.980
#> 3 1 1 3 100 0.548 0.962 0.962
index_without <- function(i, x) {
map_dbl(i, function(i) index(replace(x, i, 0)))
}
df_index %>%
group_by(firm_id, pro_id) %>%
mutate(new = index_without(row_number(), cost))
#> # A tibble: 3 x 7
#> # Groups: firm_id, pro_id [1]
#> firm_id pro_id emplo_id cost INDEX newINDEX new
#> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1 0.548 0.556 0.556
#> 2 1 1 2 50 0.548 0.980 0.980
#> 3 1 1 3 100 0.548 0.962 0.962
Created on 2018-08-08 by the reprex package (v0.2.0.9000).
Upvotes: 2