Reputation: 3
I want to calculate the percentage share and create new columns using mutate. I have the following data:
country, metric, segment, value1990, value2000, value2010
canada, abc, rural, 10, 15, 16
canada, abc, urban, 12, 12, 18
canada, abc, total, 22, 27, 34
canada, xyz, rural, 6, 9, 10
canada, xyc, urban, 7, 8, 8
canada, xyc, total, 13, 17, 18
canada, population, rural, 80, 86, 95
canada, population, urban, 102, 110, 121
canada, population, total, 182, 196, 216
The data frame consists data from several countries and across several years. I want to create a new column with the following values
country, metric, segment, value, percent1990, percent2000, percent2010
canada, abc, rural, 10, 15, 16, 12.5%, 17.4%, 16.8%
canada, abc, urban, 12, 12, 18, 11.7%, 10.9%, 14.8%
canada, abc, total, 22, 27, 34, 12.1%, 13.7%, 15.7%
canada, xyz, rural, 6, 9, 10, 7.5%, 10.4%, 10.5%
canada, xyc, urban, 7, 8, 8, 6.8%, 7.2%, 6.6%
canada, xyc, total, 13, 17, 18, 7.22%, 8.6%, 8.3%
canada, population, rural, 80, 86, 95, 100%, 100%, 100%
canada, population, urban, 102, 110, 121, 100%, 100%, 100%
canada, population, total, 182, 196, 216, 100%, 100%, 100%
Essentially, I want to calculate the value variable's percentage share of the population depending on whether it is rural/urban/total, across multiple years.
E.g.
(row 1) percent_share = (10/80)*100 = 12.5%
(row 2) percent_share = (10/102)*100 = 11.76%
(row 3) percent_share = (10/182)*100 = 12.09%
I'm unable to go beyond the group_by
chaining to ascertain how to input the necessary function
df = df %>%
group_by (country, metric) %>%
mutate(...)
Upvotes: 0
Views: 82
Reputation: 33782
EDIT: for new question data containing years
This would be easier if you moved the years and the total population to new columns. Here's one way to do that.
Assuming your example data is in a data frame named df1
: first gather
the years.
library(dplyr)
library(tidyr)
df1 <- df1 %>% gather(Year, Value, 4:6)
Then filter for metric
== population
and join back to the original data.
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y)
Result:
country segment Year population metric value
1 canada rural value1990 80 abc 10
2 canada rural value1990 80 xyz 6
3 canada urban value1990 102 abc 12
4 canada urban value1990 102 xyc 7
5 canada total value1990 182 abc 22
6 canada total value1990 182 xyc 13
7 canada rural value2000 86 abc 15
8 canada rural value2000 86 xyz 9
9 canada urban value2000 110 abc 12
10 canada urban value2000 110 xyc 8
11 canada total value2000 196 abc 27
12 canada total value2000 196 xyc 17
13 canada rural value2010 95 abc 16
14 canada rural value2010 95 xyz 10
15 canada urban value2010 121 abc 18
16 canada urban value2010 121 xyc 8
17 canada total value2010 216 abc 34
18 canada total value2010 216 xyc 18
Then add a mutate:
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y) %>%
mutate(percent_share = 100 * (value / population))
Result:
country segment Year population metric value percent_share
1 canada rural value1990 80 abc 10 12.500000
2 canada rural value1990 80 xyz 6 7.500000
3 canada urban value1990 102 abc 12 11.764706
4 canada urban value1990 102 xyc 7 6.862745
5 canada total value1990 182 abc 22 12.087912
6 canada total value1990 182 xyc 13 7.142857
7 canada rural value2000 86 abc 15 17.441860
8 canada rural value2000 86 xyz 9 10.465116
9 canada urban value2000 110 abc 12 10.909091
10 canada urban value2000 110 xyc 8 7.272727
11 canada total value2000 196 abc 27 13.775510
12 canada total value2000 196 xyc 17 8.673469
13 canada rural value2010 95 abc 16 16.842105
14 canada rural value2010 95 xyz 10 10.526316
15 canada urban value2010 121 abc 18 14.876033
16 canada urban value2010 121 xyc 8 6.611570
17 canada total value2010 216 abc 34 15.740741
18 canada total value2010 216 xyc 18 8.333333
Upvotes: 1
Reputation: 2698
You can also just group by segment
and divide by max(value
), since the population value should be the largest:
df %>%
group_by(country, segment) %>%
mutate(percent_share = value / max(value))
# A tibble: 9 x 5
# Groups: segment [3]
country metric segment value percent_share
<chr> <chr> <chr> <dbl> <dbl>
1 canada abc rural 10 0.125
2 canada abc urban 12 0.118
3 canada abc total 22 0.121
4 canada xyz rural 6 0.075
5 canada xyc urban 7 0.0686
6 canada xyc total 13 0.0714
7 canada population rural 80 1
8 canada population urban 102 1
9 canada population total 182 1
Upvotes: 1