Reputation: 207
I have a data frame like these:
NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868
This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.
What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.
For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.
Something like this:
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))
Upvotes: 1
Views: 31
Reputation: 8072
You can also use top_n
from dplyr
with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize
if you use diff
with a single value, hence the use of ifelse
.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
Upvotes: 2
Reputation: 748
You could arrange your dataframe by Share
and then slice
the first two values. Then you could use summarise
to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
Upvotes: 1