Karl Wig
Karl Wig

Reputation: 65

R add column in data frame with quarterly ranking by a characteristic

I want to adda new column to my data frame with ranks by date (in this case quarters, so it would be possible to do it by month). Firms should be ranked by their assets in that quarter/month.

There number of firms (id) in each quarter differs, and some new might enter and some old may disappear.

I want to go from this

#   id     assets date
# 1   X1   50   1994-03-31
# 2   X2   120  1994-03-31
# 3   X3   530  1994-03-31
# 4   X4   24   1994-03-31
# 6   X3   57   1994-06-30
# 7   X1   445  1994-06-30
# 8   X10  525  1994-06-30

to this

#   id     assets date      rank
# 1   X1   50   1994-03-31  3
# 2   X2   120  1994-03-31  2
# 3   X3   530  1994-03-31  1
# 4   X4   24   1994-03-31  4
# 6   X3   57   1994-06-30  3
# 7   X1   445  1994-06-30  2
# 8   X10  525  1994-06-30  1

I've tried:

temp_asset_rank <- temp_asset_rank %>%
  mutate(yearx = year(date)) %>%
  mutate(month = month(date)) %>%
  group_by(yearx, month) %>%
  mutate(ranking = rank(temp_asset_rank$assets, na.last = NA, ties.method = c("average"))) %>%
  ungroup()

But it returns:

Error: Column `ranking` must be length 11788 (the group size) or one, not 1188563

As you can see my data set is actually much larger and contain additional columns.

changing

group_by(yearx, month)

to

group_by(yearx) %>%
group_by(month)

does not work either

Can you help me?

Upvotes: 1

Views: 63

Answers (1)

hello_friend
hello_friend

Reputation: 5788

Base R solution:

within(df[order(df$assets, decreasing = TRUE),], 
       {rank <- ave(assets, date, FUN = seq.int)})

Tidyverse solution:

library(tidyverse)
df %>% 
  mutate(idx = row_number()) %>% 
  arrange(desc(assets)) %>% 
  group_by(date) %>% 
  mutate(rank = row_number()) %>% 
  ungroup() %>% 
  arrange(idx) %>% 
  select(-idx)

Data:

df <- structure(list(id = c("X1", "X2", "X3", "X4", "X3", "X1", "X10"), 
  assets = c(50L, 120L, 530L, 24L, 57L, 445L, 525L), 
  date = c("1994-03-31", "1994-03-31", "1994-03-31", "1994-03-31", "1994-06-30",
           "1994-06-30", "1994-06-30")), class = "data.frame", row.names = c(NA, -7L))

Upvotes: 1

Related Questions