Reputation: 560
I am not sure how to succinctly search for this or to title the question, so apologies in advance if this is a duplicate.
I want to obtain df_1
from df_0
using tidyverse tools. For each sector, the column city
in df_1
should give me the one with more jobs
for the given sector. Column jobs
should display the sum of jobs in a sector, irrespective of city.
df_0 <- data.frame(
sector = rep(c('S1', 'S2', 'S3', 'S2', 'S1'), each = 2),
city = rep(c('A', 'B', 'C', 'B', 'C'), times = 2),
jobs = 1:10)
df_1 <- data.frame(
sector = c('S1', 'S2', 'S3'),
city = c('B', 'B', 'A'),
jobs = c(22, 22, 11)
)
Thank you!
Upvotes: 0
Views: 31
Reputation: 160577
df_0 %>%
group_by(sector, city) %>%
summarize(jobs = sum(jobs)) %>%
group_by(sector) %>%
summarize(city = city[which.max(jobs)], jobs = sum(jobs)) %>%
ungroup()
# # A tibble: 3 x 3
# sector city jobs
# <chr> <chr> <int>
# 1 S1 B 22
# 2 S2 B 22
# 3 S3 A 11
Upvotes: 1