Reputation: 59
I am carrying out an analysis on some Italian regions. I have a dataset similar to the following:
mydata <- data.frame(date= c(2020,2021,2020,2021,2020,2021),
Region= c('Sicilia','Sicilia','Sardegna','Sardegna','Campania','Campania'),
Number=c(20,30,50,70,90,69) )
Now I have to create two new columns. The first (called 'Total population') containing a fixed number for each region (for example each row with Sicily will have a "Total Population" = 250). The second column instead contains the % ratio between the value of 'Number' column and the corresponding value of 'Total Population' (for example for Sicily the value will be 20/250 and so on). I hope I explained myself well, Thank you very much
Upvotes: 0
Views: 42
Reputation: 5429
Like thsi perhaps:
mydata %<>% group_by( Region ) %>%
mutate(
`Total Population` = sum(Number),
`Ratio of Total` = sprintf( "%.1f%%",100 * Number / sum(Number)) )
mydata is now:
> mydata
# A tibble: 6 x 5
# Groups: Region [3]
date Region Number `Total Population` `Ratio of Total`
<dbl> <chr> <dbl> <dbl> <chr>
1 2020 Sicilia 20 50 40.0%
2 2021 Sicilia 30 50 60.0%
3 2020 Sardegna 50 120 41.7%
4 2021 Sardegna 70 120 58.3%
5 2020 Campania 90 159 56.6%
6 2021 Campania 69 159 43.4%
Upvotes: 1