Reputation: 2185
I am having a data set:
Security %market value return Quintile*
1 0.07 100 3
2 0.10 88 2
3 0.08 78 1
4 0.12 59 1
5 0.20 106 4
6 0.04 94 3
7 0.05 111 5
8 0.10 83 2
9 0.06 97 3
10 0.03 90 3
11 0.15 119 5
the actual data set is having more than 5,000 rows, and I would like to use R to create 5 quintiles, each quintile is suppose to have 20% of market value. In addition, they have to be ranked in the order of magnitude of return. That is, 1st quintile should contain the 20% securities with the lowest return value, 5th quintile should contain the 20% securities with the highest return value. I would like to create the column "Quintile", among different quintiles there can be different numbers of securities but total %market value should be same. I have tries several methods and I am very new to R, so please kindly provide me some help. Thank you very much in advance!
Samuel
Upvotes: 2
Views: 466
Reputation: 59970
You can order your data and then use findInterval (adding a small delta to use closed right sided braces):
raw_data <- raw_data[order(raw_data$return),]
raw_data$Q2 <- findInterval( cumsum(raw_data$marketvalue) , seq(0,1,length=5)+0.000001 , right = T )
raw_data
# Security marketvalue return Quintile Q2
#4 4 0.12 59 1 1
#3 3 0.08 78 1 1
#8 8 0.10 83 2 2
#2 2 0.10 88 2 2
#10 10 0.03 90 3 3
#6 6 0.04 94 3 3
#9 9 0.06 97 3 3
#1 1 0.07 100 3 3
#5 5 0.20 106 4 4
#7 7 0.05 111 5 5
#11 11 0.15 119 5 5
Upvotes: 1
Reputation: 89057
The following works with your data.
First, sort by increasing return
:
dat <- dat[order(dat$return), ]
Then, compute the cumulative market
share and cut every 0.2:
dat$Quintile <- ceiling(cumsum(dat$market) / 0.2)
Finally, sort things back by Security
:
dat <- dat[order(dat$Security), ]
Upvotes: 1