Reputation: 51
I have panel data which contains 3 columns: Firm, Year, Income. The number of observations are different for each firm. For example, Firm BBB has 20 observations while Firm AAA just has 19.
Firm <- rep(c("AAA","BBB","CCC","DDD","EEE"), each=20)
Year <- rep(seq(1997,2016,1),times=5)
Income <- rnorm(100, mean=50, sd=10)
df <- cbind(Firm,Year,Income)
df <- as.data.frame(df)
df <- df[!(df$Firm=="AAA" & df$Year=="2016"),]
The actual data contains more than 5000 firms and more than 50 years for each firm. The start date and end date for each firm are all different. But this is a good enough example to use.
I would like to assign quintile to the income for all the firms for EACH YEAR separately and put that in a new column called "Quint". For example, for year 1997, Firm "AAA" has income 50, Firm "BBB" has income 49, Firm "CCC" has income 48, Firm "DDD" has income 47, and Firm "EEE" has income 46. So Firm "AAA" has 1, "BBB" has 2, "CCC" has 3, "DDD" has 4, and "EEE" has 5, for 1997.
I have a custom function that does it for all the years but can't perform for EACH YEAR:
quan <- function (x){
y <-ifelse(x<=quantile(x,c(.2),na.rm=TRUE), 1,
ifelse(x>quantile(x,c(.2),na.rm=TRUE)&x<=quantile(x,c(.4),na.rm=TRUE), 2,
ifelse(x>quantile(x,c(.4),na.rm=TRUE)&x<=quantile(x,c(.6),na.rm=TRUE), 3,
ifelse(x>quantile(x,c(.6),na.rm=TRUE)&x<=quantile(x,c(.8),na.rm=TRUE), 4,
ifelse(x>quantile(x,c(.8),na.rm=TRUE), 5, NA)))))
y
}
How to do it for each year? Thanks.
Upvotes: 2
Views: 958
Reputation: 93871
With the dplyr
package you can use the ntile
function to calculate the quintiles. Group by Year
to get quintiles by year. Also, note the change from cbind
to data.frame
in your data-creation code. cbind
creates a matrix and all values in a matrix must be of the same type, so the numeric values are coerced to character strings.
df <- data.frame(Firm, Year, Income)
df <- df[!(df$Firm=="AAA" & df$Year=="2016"),]
library(dplyr)
df = df %>% group_by(Year) %>%
mutate(Quint = ntile(Income, 5))
Firm Year Income Quint <fctr> <dbl> <dbl> <int> 1 AAA 1997 66.99350 5 2 AAA 1998 55.18437 3 3 AAA 1999 31.39550 1 4 AAA 2000 57.39199 4 5 AAA 2001 41.03834 3 6 AAA 2002 51.85919 4 7 AAA 2003 38.21712 3 8 AAA 2004 45.97977 4 9 AAA 2005 47.62680 3 10 AAA 2006 48.78366 3 # ... with 89 more rows
With base R, you could use lapply
and split
to operate on each Year
separately, and the cut
function to group by quintile:
df = do.call(rbind,
lapply(split(df, df$Year), function(x) {
data.frame(x, Quint=cut(x[ , "Income"],
quantile(x[,"Income"], probs=seq(0,1,0.2)),
labels=1:5,
include.lowest=TRUE))
}))
Upvotes: 3