Neil
Neil

Reputation: 8247

how to find top N descending values in group in dplyr

I have a following dataframe in R

  Serivce     Codes
   ABS         RT
   ABS         RT
   ABS         TY
   ABS         DR
   ABS         DR
   ABS         DR
   ABS         DR
   DEF         RT
   DEF         RT
   DEF         TY
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         TY
   DEF         SE
   DEF         SE

What I want is service wise code count in descending order

  Serivce     Codes    Count
   ABS         DR        4
   ABS         RT        2 
   ABS         TY        1
   DEF         DR        4
   DEF         RT        2
   DEF         TY        2  

I am doing following in r

df%>% 
group_by(Service,Codes) %>% 
summarise(Count = n()) %>%
top_n(n=3,wt = Count) %>% 
arrange(desc(Count)) %>% 
as.data.frame()   

But,it does not give me what is intended.

Upvotes: 8

Views: 10649

Answers (3)

user6320548
user6320548

Reputation: 11

df%>% count(Service,Codes) %>% mutate(rank = dense_rank(desc(n))) %>% filter(rank < 5)

number of rows to return for top_n() just like row_number()

n is group by Service,Codes then count like

Upvotes: 1

lmo
lmo

Reputation: 38500

In base R, you can do this in two lines.

# get data.frame of counts by service-code pairs
mydf <- data.frame(table(dat))

# get top 3 by service
do.call(rbind, lapply(split(mydf, mydf$Serivce), function(x) x[order(-x$Freq)[1:3],]))

This returns

      Serivce Codes Freq
ABS.1     ABS    DR    4
ABS.3     ABS    RT    2
ABS.7     ABS    TY    1
DEF.2     DEF    DR    4
DEF.4     DEF    RT    2
DEF.6     DEF    SE    2

In the first line use table to get the counts, then convert to data.frame. In the second line, split by service, order by the negative values of order and pull out the first three elements. Combine the results with do.call.

Upvotes: 0

akrun
akrun

Reputation: 886938

We can try with count/arrange/slice

df1 %>% 
   count(Service, Codes) %>%
   arrange(desc(n)) %>% 
   group_by(Service) %>% 
   slice(seq_len(3))
# A tibble: 6 x 3
# Groups:   Service [2]
#  Service Codes     n
#    <chr> <chr> <int>
#1     ABS    DR     4
#2     ABS    RT     2
#3     ABS    TY     1
#4     DEF    DR     4
#5     DEF    RT     2
#6     DEF    SE     2

In the OP's code, we need to arrange by 'Service' too. As @Marius said in the comments, the top_n will include more number of rows if there are ties. One option is to do a second grouping with 'Service' and slice (as showed above) or after the grouping, we can filter

df1 %>% 
  group_by(Service,Codes) %>%
  summarise(Count = n()) %>%
  top_n(n=3,wt = Count)  %>%
  arrange(Service, desc(Count)) %>%
  group_by(Service) %>%
  filter(row_number() <=3)

Upvotes: 10

Related Questions