user3206440
user3206440

Reputation: 5059

dplyr - group last n row values

I have a dataframe as below

+--------+-----------+-----+
|  make  |   model   | cnt |
+--------+-----------+-----+
| toyota |  camry    |  10 |
| toyota |  corolla  |   4 |
| honda  |  city     |   8 |
| honda  |  accord   |  13 |
| jeep   |  compass  |   3 |
| jeep   |  wrangler |   5 |
| jeep   |  renegade |   1 |
| accura |  x1       |   2 |
| accura |  x3       |   1 |
+--------+-----------+-----+

I need to aggregate this dataframe by Make so as to get the total volume and share - I do this as follows.

df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
                 Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
                 Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1))
dfc <- df %>%
  group_by(Make) %>%
  summarise(volume = sum(Cnt)) %>%
  mutate(share=volume/sum(volume)*100.0) %>%
  arrange(desc(volume))

The above operation gives me the share and volume aggregated by Make as below.

+--------+--------+-----------+
| make   | volume | share     |
+--------+--------+-----------+
| honda  | 21     | 44.680851 |
| toyota | 14     | 29.787234 |
| jeep   | 9      | 19.148936 |
| accura | 3      | 6.382979  |
+--------+--------+-----------+

I need to group everything except the first two rows to a group others and also aggregate the volume and share such that the dataframe would look like below.

+--------+--------+-----------+
| make   | volume | share     |
+--------+--------+-----------+
| honda  | 21     | 44.680851 |
| toyota | 14     | 29.787234 |
| others | 12     | 25.53191  |
+--------+--------+-----------+

Upvotes: 0

Views: 554

Answers (1)

AntoniosK
AntoniosK

Reputation: 16121

library(dplyr)

# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
                 Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
                 Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)

# specify number of rows
row_threshold = 2

df %>%
  group_by(Make) %>%
  summarise(volume = sum(Cnt)) %>%
  mutate(share=volume/sum(volume)*100.0) %>%
  arrange(desc(volume)) %>%
  group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
  summarise(volume = sum(volume),
            share = sum(share))

# # A tibble: 3 x 3
#   Make_upd volume    share
#      <chr>  <dbl>    <dbl>
# 1    honda     21 44.68085
# 2   others     12 25.53191
# 3   toyota     14 29.78723

Upvotes: 4

Related Questions