Jones
Jones

Reputation: 343

How to get only the n highest values per month

I have this dataframe exports:

date <- c("2012-10-01", "2012-10-01", "2012-10-01","2012-10-01","2012-10-01","2012-11-01","2012-11-01","2012-11-01","2012-11-01", "2012-11-01")
item <- c("A", "B", "D", "F", "C", "A", "B", "C","H","D")
amount <- c(20,50,35,55,32,40,53,23,74,99)

exports <- data.frame(date, item, amount)

So, for each month, we have a set of items that were exported (in the original dataset, on average there are 15 items per month). What I want is to get only the 4 highest values in the amount column, per month. So, for each month, I will have 4 rows, instead of 15 that I have now. From the first to the fourth in the month of January, from the first to the fourth in the month of February, and so on. I appreciate it if someone can help.

Ps: the main goal, after all that, is to plot a time series with the evolution of these 4 items. If I solve the problem above, the plot thing will be done, but I'm open to other solutions.

Upvotes: 1

Views: 178

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388962

We can convert date to date class, extract year-month from the date and select first 4 values for each month.

library(dplyr)
library(lubridate)

exports %>%
  mutate(date = ymd(date), 
         month_year = format(date, '%b %Y')) %>%
  arrange(desc(amount)) %>%
  group_by(month_year) %>%
  slice_head(n = 4)
  #Can also do
  #slice(1:4)

Upvotes: 1

akrun
akrun

Reputation: 887078

We can use top_n after grouping by 'month'

library(dplyr)
exports %>% 
     group_by(month = format(as.Date(date), '%m')) %>%
     top_n(4, amount)

-output

# A tibble: 4 x 4
# Groups:   month [1]
#  date       item  amount month
#  <chr>      <chr>  <dbl> <chr>
#1 2012-10-01 F         55 10   
#2 2012-10-02 B         53 10   
#3 2012-10-02 H         74 10   
#4 2012-10-02 D         99 10   

Or using slice_max

exports %>% 
   group_by(month = format(as.Date(date), '%m')) %>% 
   slice_max(order_by = amount, n = 4)

-output

# A tibble: 4 x 4
# Groups:   month [1]
#  date       item  amount month
#  <chr>      <chr>  <dbl> <chr>
#1 2012-10-02 D         99 10   
#2 2012-10-02 H         74 10   
#3 2012-10-01 F         55 10   
#4 2012-10-02 B         53 10   

Upvotes: 2

Related Questions