Arthur Carvalho Brito
Arthur Carvalho Brito

Reputation: 560

use dplyr to display level corresponding to the largest sum of some variable variable

I am not sure how to succinctly search for this or to title the question, so apologies in advance if this is a duplicate.

I want to obtain df_1 from df_0 using tidyverse tools. For each sector, the column city in df_1 should give me the one with more jobs for the given sector. Column jobs should display the sum of jobs in a sector, irrespective of city.

 df_0 <- data.frame(
      sector = rep(c('S1', 'S2', 'S3', 'S2', 'S1'), each =  2),
      city = rep(c('A', 'B', 'C', 'B', 'C'), times = 2),
      jobs = 1:10)


df_1 <- data.frame(
  sector = c('S1', 'S2', 'S3'),
  city = c('B', 'B', 'A'),
  jobs = c(22, 22, 11)
)

Thank you!

Upvotes: 0

Views: 31

Answers (1)

r2evans
r2evans

Reputation: 160577

df_0 %>%
  group_by(sector, city) %>%
  summarize(jobs = sum(jobs)) %>%
  group_by(sector) %>%
  summarize(city = city[which.max(jobs)], jobs = sum(jobs)) %>%
  ungroup()
# # A tibble: 3 x 3
#   sector city   jobs
#   <chr>  <chr> <int>
# 1 S1     B        22
# 2 S2     B        22
# 3 S3     A        11

Upvotes: 1

Related Questions