Reputation: 5059
I have a dataframe as below
+--------+-----------+-----+
| make | model | cnt |
+--------+-----------+-----+
| toyota | camry | 10 |
| toyota | corolla | 4 |
| honda | city | 8 |
| honda | accord | 13 |
| jeep | compass | 3 |
| jeep | wrangler | 5 |
| jeep | renegade | 1 |
| accura | x1 | 2 |
| accura | x3 | 1 |
+--------+-----------+-----+
I need to aggregate this dataframe by Make
so as to get the total volume and share - I do this as follows.
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1))
dfc <- df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume))
The above operation gives me the share
and volume
aggregated by Make
as below.
+--------+--------+-----------+
| make | volume | share |
+--------+--------+-----------+
| honda | 21 | 44.680851 |
| toyota | 14 | 29.787234 |
| jeep | 9 | 19.148936 |
| accura | 3 | 6.382979 |
+--------+--------+-----------+
I need to group everything except the first two rows
to a group others
and also aggregate the volume
and share
such that the dataframe would look like below.
+--------+--------+-----------+
| make | volume | share |
+--------+--------+-----------+
| honda | 21 | 44.680851 |
| toyota | 14 | 29.787234 |
| others | 12 | 25.53191 |
+--------+--------+-----------+
Upvotes: 0
Views: 554
Reputation: 16121
library(dplyr)
# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)
# specify number of rows
row_threshold = 2
df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume)) %>%
group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
summarise(volume = sum(volume),
share = sum(share))
# # A tibble: 3 x 3
# Make_upd volume share
# <chr> <dbl> <dbl>
# 1 honda 21 44.68085
# 2 others 12 25.53191
# 3 toyota 14 29.78723
Upvotes: 4