Reputation: 967
I have this tibble.
# A tibble: 214 x 3
launch_year state_name n
<int> <fct> <int>
1 1965 France 1
2 1966 France 1
3 1966 Japan 2
4 1967 France 2
5 1967 Italy 1
6 1967 Japan 1
7 1968 I-ELDO 1
8 1969 I-ELDO 1
9 1969 Japan 1
10 1970 China 1
I'd like to add a column for proportions. It would look something like this
launches_processed %>%
count(launch_year, state_name) %>%
mutate(prop = [launches by state_name] / [total launches that year] * 100)
I can set [launches by state_name]
equal to n
.
How do I get [total launches that year]
?
Upvotes: 0
Views: 402
Reputation: 1702
Simply group by the launch_year
to get the sums by that year; add that total as a new column, rename your n
column given that's the total by state by year and divide by the year's total.
library(tidyverse)
launches_processed %>%
group_by(launch_year) %>%
mutate(`total launches that year` = sum(n)) %>%
rename(`launches by state_name` = n) %>%
mutate(prop = `launches by state_name`/ `total launches that year` * 100)
Result:
launch_year state_name `launches by state_name` `total launches that year` prop
<dbl> <chr> <dbl> <dbl> <dbl>
1 1965 France 1 1 100
2 1966 France 1 3 33.3
3 1966 Japan 2 3 66.7
4 1967 France 2 4 50
5 1967 Italy 1 4 25
6 1967 Japan 1 4 25
7 1968 I-ELDO 1 1 100
8 1969 I-ELDO 1 2 50
9 1969 Japan 1 2 50
10 1970 China 1 1 100
Upvotes: 2
Reputation: 872
You should be able to achieve this by using a combination of group_by()
and ungroup()
in dplyr.
library(dplyr)
df <- data.frame(stringsAsFactors=FALSE,
V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
launch_year = c(1965, 1966, 1966, 1967, 1967, 1967, 1968, 1969, 1969, 1970),
state_name = c("France", "France", "Japan", "France", "Italy", "Japan",
"I-ELDO", "I-ELDO", "Japan", "China"),
V4 = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1)
)
df %>%
count(launch_year, state_name) %>%
group_by(launch_year) %>%
mutate(launches_that_year = sum(n)) %>%
ungroup() %>%
group_by(state_name) %>%
mutate(launches_by_state_name = sum(n)) %>%
ungroup() %>%
mutate(prop = (launches_that_year) / (launches_by_state_name) * 100)
#> # A tibble: 10 x 6
#> launch_year state_name n launches_that_ye~ launches_by_state_~ prop
#> <dbl> <chr> <int> <int> <int> <dbl>
#> 1 1965 France 1 1 3 33.3
#> 2 1966 France 1 2 3 66.7
#> 3 1966 Japan 1 2 3 66.7
#> 4 1967 France 1 3 3 100
#> 5 1967 Italy 1 3 1 300
#> 6 1967 Japan 1 3 3 100
#> 7 1968 I-ELDO 1 1 2 50
#> 8 1969 I-ELDO 1 2 2 100
#> 9 1969 Japan 1 2 3 66.7
#> 10 1970 China 1 1 1 100
Created on 2019-02-10 by the reprex package (v0.2.0).
Upvotes: 2