Greg
Greg

Reputation: 3670

Add a grouping variable based on ranked data

Consider the following dataframe:

name <- c("Sally", "Dave", "Aaron", "Jane", "Michael")
rank <- c(1,2,1,2,3)
df <- data.frame(name, rank, stringsAsFactors = FALSE)

I'd like to create a grouping variable (event) based on the rank column, as such:

event <- c("Hurdles", "Hurdles", "Long Jump", "Long Jump", "Long Jump")
df_desired <- data.frame(name, rank, event, stringsAsFactors = FALSE)

There are lots of examples of going the other way (making a ranking variable based on a group) but I can't seem to find one doing what I'd like.

It's possible to use filter, full_join and then fill as shown below, but is there a simpler way?

library(tidyverse)
df <- df %>% 
  mutate(order = row_number())

df_1 <- df %>% 
  filter(rank == 1)
df_1$event <- c("Hurdles", "Long Jump")

df %>% 
  filter(rank != 1) %>% 
  mutate(event = as.character(NA)) %>% 
  full_join(df_1, by = c("order", "name", "rank", "event")) %>% 
  arrange(order) %>% 
  fill(event) %>%
  select(-order)

Upvotes: 1

Views: 58

Answers (1)

akrun
akrun

Reputation: 887991

We can use cumsum to create the index

library(dplyr)
df %>% 
   mutate(event = c("Hurdles", "Long Jump")[cumsum(rank == 1)])
#      name rank     event
#1   Sally    1   Hurdles
#2    Dave    2   Hurdles
#3   Aaron    1 Long Jump
#4    Jane    2 Long Jump
#5 Michael    3 Long Jump

Or in base R (just in case)

df$event <- c("Hurdles", "Long Jump")[cumsum(df$rank == 1)])

Upvotes: 4

Related Questions