Reputation: 343
I have this dataframe exports:
date <- c("2012-10-01", "2012-10-01", "2012-10-01","2012-10-01","2012-10-01","2012-11-01","2012-11-01","2012-11-01","2012-11-01", "2012-11-01")
item <- c("A", "B", "D", "F", "C", "A", "B", "C","H","D")
amount <- c(20,50,35,55,32,40,53,23,74,99)
exports <- data.frame(date, item, amount)
So, for each month, we have a set of items that were exported (in the original dataset, on average there are 15 items per month). What I want is to get only the 4 highest values in the amount column, per month. So, for each month, I will have 4 rows, instead of 15 that I have now. From the first to the fourth in the month of January, from the first to the fourth in the month of February, and so on. I appreciate it if someone can help.
Ps: the main goal, after all that, is to plot a time series with the evolution of these 4 items. If I solve the problem above, the plot thing will be done, but I'm open to other solutions.
Upvotes: 1
Views: 178
Reputation: 388962
We can convert date to date class, extract year-month from the date and select first 4 values for each month.
library(dplyr)
library(lubridate)
exports %>%
mutate(date = ymd(date),
month_year = format(date, '%b %Y')) %>%
arrange(desc(amount)) %>%
group_by(month_year) %>%
slice_head(n = 4)
#Can also do
#slice(1:4)
Upvotes: 1
Reputation: 887078
We can use top_n
after grouping by 'month'
library(dplyr)
exports %>%
group_by(month = format(as.Date(date), '%m')) %>%
top_n(4, amount)
-output
# A tibble: 4 x 4
# Groups: month [1]
# date item amount month
# <chr> <chr> <dbl> <chr>
#1 2012-10-01 F 55 10
#2 2012-10-02 B 53 10
#3 2012-10-02 H 74 10
#4 2012-10-02 D 99 10
Or using slice_max
exports %>%
group_by(month = format(as.Date(date), '%m')) %>%
slice_max(order_by = amount, n = 4)
-output
# A tibble: 4 x 4
# Groups: month [1]
# date item amount month
# <chr> <chr> <dbl> <chr>
#1 2012-10-02 D 99 10
#2 2012-10-02 H 74 10
#3 2012-10-01 F 55 10
#4 2012-10-02 B 53 10
Upvotes: 2