Dylan
Dylan

Reputation: 51

Assign Quintiles for each year in Panel data with different number of observations

I have panel data which contains 3 columns: Firm, Year, Income. The number of observations are different for each firm. For example, Firm BBB has 20 observations while Firm AAA just has 19.

Firm <- rep(c("AAA","BBB","CCC","DDD","EEE"), each=20)
Year <- rep(seq(1997,2016,1),times=5)
Income <- rnorm(100, mean=50, sd=10)
df <- cbind(Firm,Year,Income)
df <- as.data.frame(df) 
df <- df[!(df$Firm=="AAA" & df$Year=="2016"),]

The actual data contains more than 5000 firms and more than 50 years for each firm. The start date and end date for each firm are all different. But this is a good enough example to use.

I would like to assign quintile to the income for all the firms for EACH YEAR separately and put that in a new column called "Quint". For example, for year 1997, Firm "AAA" has income 50, Firm "BBB" has income 49, Firm "CCC" has income 48, Firm "DDD" has income 47, and Firm "EEE" has income 46. So Firm "AAA" has 1, "BBB" has 2, "CCC" has 3, "DDD" has 4, and "EEE" has 5, for 1997.

I have a custom function that does it for all the years but can't perform for EACH YEAR:

quan <- function (x){
y <-ifelse(x<=quantile(x,c(.2),na.rm=TRUE), 1,
ifelse(x>quantile(x,c(.2),na.rm=TRUE)&x<=quantile(x,c(.4),na.rm=TRUE), 2,
ifelse(x>quantile(x,c(.4),na.rm=TRUE)&x<=quantile(x,c(.6),na.rm=TRUE), 3,        
ifelse(x>quantile(x,c(.6),na.rm=TRUE)&x<=quantile(x,c(.8),na.rm=TRUE), 4,
ifelse(x>quantile(x,c(.8),na.rm=TRUE), 5, NA)))))
y
}

How to do it for each year? Thanks.

Upvotes: 2

Views: 958

Answers (1)

eipi10
eipi10

Reputation: 93871

With the dplyr package you can use the ntile function to calculate the quintiles. Group by Year to get quintiles by year. Also, note the change from cbind to data.frame in your data-creation code. cbind creates a matrix and all values in a matrix must be of the same type, so the numeric values are coerced to character strings.

df <- data.frame(Firm, Year, Income)
df <- df[!(df$Firm=="AAA" & df$Year=="2016"),]

library(dplyr)

df = df %>% group_by(Year) %>% 
  mutate(Quint = ntile(Income, 5))
     Firm  Year   Income    Quint
   <fctr> <dbl>    <dbl>    <int>
1     AAA  1997 66.99350        5
2     AAA  1998 55.18437        3
3     AAA  1999 31.39550        1
4     AAA  2000 57.39199        4
5     AAA  2001 41.03834        3
6     AAA  2002 51.85919        4
7     AAA  2003 38.21712        3
8     AAA  2004 45.97977        4
9     AAA  2005 47.62680        3
10    AAA  2006 48.78366        3
# ... with 89 more rows

With base R, you could use lapply and split to operate on each Year separately, and the cut function to group by quintile:

df = do.call(rbind, 
             lapply(split(df, df$Year), function(x) {
               data.frame(x, Quint=cut(x[ , "Income"], 
                                       quantile(x[,"Income"], probs=seq(0,1,0.2)), 
                                       labels=1:5,
                                       include.lowest=TRUE))
             }))

Upvotes: 3

Related Questions