Othmane Bentaouzer
Othmane Bentaouzer

Reputation: 33

How to sum specific rows in a dataframe using R?

I'm working on a research paper and I've got a dataframe that includes some departments and their budgets over a period of time. Let's take the following dataframe as an example.

df
departments   budget
  <chr>       <dbl>
 test1        100
 test2        200
 test3        300

For my case, "test1" and "test3" are two different expressions that actually refer to the same department. So I need to sum their budgets.

Here's the result that I expect

df
departments   budget
  <chr>       <dbl>
 test1        400
 test2        200

Upvotes: 3

Views: 253

Answers (2)

Haritz Laboa
Haritz Laboa

Reputation: 768

There is no need of using IDs. If your goal is combining every test3 with test1, and getting the sum of budget of this join, you can use dplyr functions like this:

library(dplyr)

df %>%
  mutate(departments, departments = ifelse(departments=="test3", "test1", departments)) %>%
  group_by(departments) %>% 
  count(departments, wt=budget) -> df

The code above will give you the result you are looking for.

Upvotes: 1

camille
camille

Reputation: 16832

For a very small example with only 2 values being collapsed into 1, something like ifelse is fine. But it doesn't scale well beyond a single operation collapsing a single pair of values—you'll be stuck in lots of nested ifelse statements or maybe a case_when. I do these operations a lot, and recommend converting your variable to a factor so you can more easily and flexibly manipulate levels. forcats::fct_collapse does this this well, and forcats::fct_other works for special situations.

For just a small example, it may seem like more work than it's worth:

library(dplyr)

df <- tribble(
  ~departments, ~budget,
  "test1",      100,
  "test2",      200,
  "test3",      300
)

df %>%
  mutate(departments = as.factor(departments) %>%
           forcats::fct_collapse(test1 = c("test1", "test3"))) %>%
  group_by(departments) %>%
  summarise(budget = sum(budget))
#> # A tibble: 2 x 2
#>   departments budget
#>   <fct>        <dbl>
#> 1 test1          400
#> 2 test2          200

But if I now add more departments, and need to do multiple of these operations, it scales. Here I need to collapse 3 levels into 1, and then another 2 into another 1.

df2 <- tribble(
  ~departments, ~budget,
  "test1",      100,
  "test2",      200,
  "test3",      300,
  "test4",      400,
  "test5",      500
)

df2 %>%
  mutate(departments = as.factor(departments) %>%
           forcats::fct_collapse(test1 = c("test1", "test3", "test4"),
                                 test2 = c("test2", "test5"))) %>%
  group_by(departments) %>%
  summarise(budget = sum(budget))
#> # A tibble: 2 x 2
#>   departments budget
#>   <fct>        <dbl>
#> 1 test1          800
#> 2 test2          700

Upvotes: 2

Related Questions