Reputation: 33
I'm working on a research paper and I've got a dataframe that includes some departments and their budgets over a period of time. Let's take the following dataframe as an example.
df
departments budget
<chr> <dbl>
test1 100
test2 200
test3 300
For my case, "test1" and "test3" are two different expressions that actually refer to the same department. So I need to sum their budgets.
Here's the result that I expect
df
departments budget
<chr> <dbl>
test1 400
test2 200
Upvotes: 3
Views: 253
Reputation: 768
There is no need of using IDs. If your goal is combining every test3 with test1, and getting the sum of budget of this join, you can use dplyr
functions like this:
library(dplyr)
df %>%
mutate(departments, departments = ifelse(departments=="test3", "test1", departments)) %>%
group_by(departments) %>%
count(departments, wt=budget) -> df
The code above will give you the result you are looking for.
Upvotes: 1
Reputation: 16832
For a very small example with only 2 values being collapsed into 1, something like ifelse
is fine. But it doesn't scale well beyond a single operation collapsing a single pair of values—you'll be stuck in lots of nested ifelse
statements or maybe a case_when
. I do these operations a lot, and recommend converting your variable to a factor so you can more easily and flexibly manipulate levels. forcats::fct_collapse
does this this well, and forcats::fct_other
works for special situations.
For just a small example, it may seem like more work than it's worth:
library(dplyr)
df <- tribble(
~departments, ~budget,
"test1", 100,
"test2", 200,
"test3", 300
)
df %>%
mutate(departments = as.factor(departments) %>%
forcats::fct_collapse(test1 = c("test1", "test3"))) %>%
group_by(departments) %>%
summarise(budget = sum(budget))
#> # A tibble: 2 x 2
#> departments budget
#> <fct> <dbl>
#> 1 test1 400
#> 2 test2 200
But if I now add more departments, and need to do multiple of these operations, it scales. Here I need to collapse 3 levels into 1, and then another 2 into another 1.
df2 <- tribble(
~departments, ~budget,
"test1", 100,
"test2", 200,
"test3", 300,
"test4", 400,
"test5", 500
)
df2 %>%
mutate(departments = as.factor(departments) %>%
forcats::fct_collapse(test1 = c("test1", "test3", "test4"),
test2 = c("test2", "test5"))) %>%
group_by(departments) %>%
summarise(budget = sum(budget))
#> # A tibble: 2 x 2
#> departments budget
#> <fct> <dbl>
#> 1 test1 800
#> 2 test2 700
Upvotes: 2