Reputation: 830
Up until now i have sorted everything according to the value of my variable so for example if i have a row of n
numbers I would have picked the numbers that lie between a
and b
. What i in fact need to do is find the %a and %b.
I have been using this:
a <- 05
b <- 0.4
colnames(data[,which(data > a & data < b)])
What i need is to split my row into deciles. So the highest 10% values, then the ones that lie between 10% - 20% and so on up until highest 90% -100%. Values must not overlap withing the deciles and my data does not divide by 10 exactly.
EDIT I have the following chunk of my data:
dput(data)
structure(list(AN8068571086 = c(0.501692168, 0.197414678, 0.415273482,
0.3078506, 0.36441391, 0.492483978, 0.398119861, 0.501925374,
0.660172121, 0.379188187), BMG3223R1088 = c(0.402426587, 0.214836776,
0.328226835, 0.265325336, 0.25724501, 0.396151915, 0.377199761,
0.31474308, 0.484177362, 0.412847814), BMG4388N1065 = c(0.592822703,
0.308105268, 0.374769701, 0.563959456, 0.335778936, 0.455266056,
0.510205508, 0.384208097, 0.460911179, 0.408350205), BMG6359F1032 = c(0.41153064,
0.221527294, 0.37383843, 0.329890556, 0.356333922, 0.397373547,
0.387519253, 0.424925141, 0.578383479, 0.411399158), BMG7496G1033 = c(0.478470376,
0.222667989, 0.33437412, 0.352835697, 0.299427154, 0.573123951,
0.466177145, 0.447775951, 0.477199807, 0.514107898), BMG812761002 = c(0.317522103,
0.265366064, 0.397487594, 0.348840651, 0.428338929, 0.282390173,
0.571658903, 0.450001013, 0.864445892, 0.418532333), CA88157K1012 = c(0.512859762,
0.183395043, 0.36847587, 0.364320833, 0.41197194, 0.628829565,
0.357019295, 0.341567448, 0.536733877, 0.343791549), CH0044328745 = c(0.499076264,
0.203778437, 0.310663532, 0.288884148, 0.247539664, 0.293768434,
0.348647329, 0.171457967, 0.391893463, 0.520079294), CH0048265513 = c(0.392308285,
0.245092722, 0.406807313, 0.338218477, 0.337216158, 0.396477472,
0.444780447, 0.513073443, 0.5655301, 0.372365682), GB00B4VLR192 = c(0.371059427,
0.243691452, 0.382559417, 0.36669396, 0.331187524, 0.336644629,
0.386660867, 0.408767967, 0.570252986, 0.350705351)), .Names = c("AN8068571086",
"BMG3223R1088", "BMG4388N1065", "BMG6359F1032", "BMG7496G1033",
"BMG812761002", "CA88157K1012", "CH0044328745", "CH0048265513",
"GB00B4VLR192"), row.names = c(NA, -10L), class = "data.frame")
The process should work as follows: (1) loop across rows , (2) find lowest 10% values, (3) get colnames of the columns where the 10% lowest values are, and store in a list. The code bellow is what i had before and searches for column names which have a row value that lies between a and b. all that i need is the column names and not the actual values from the row.
stockpicks <- list()
a <- 0.3
b <- 0.7
for (i in 1:nrow(data)) {
input <- as.matrix(data[1,])
#extract colnames of values between a and b
efficient <- matrix(colnames(data[,which(input > a & input < b)]))
# make a vector with new name for the output
tmp_date <- head(rownames(input), n=1)
#rename column
colnames(efficient) <-tmp_date
#export to list under new name
stockpicks[[tmp_date]] <- efficient
}
Upvotes: 0
Views: 733
Reputation: 1549
To expand on Eric's comment, you could use quantile
with cut
. For example given a vector of data, or a row of a matrix v
you could do something like
v = rnorm(1000)
cut(v,breaks = quantile(v,probs = (0:10)/10))
which will give you a factor with 10 levels based on the deciles as break points.
Based on the updated question you could do something like the following:
d = as.matrix(data)
lapply(1:nrow(d), function(i) colnames(d)[d[i,] < quantile(d[i,],.1)])
You could also use apply
on d
directly with MARGIN = 1
but this would cause a problem if there was a differing number of values in the bottom 10% in different rows. It works on your minimal example but may not give the expected answer on a larger data frame.
Upvotes: 2
Reputation: 10483
Here is how you can use quantile to get what you want:
set.seed(0)
x <- as.integer(rnorm(1000, 100, 50))
quantile(x, probs = seq(0, 1, .1))
Output will be:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
-61.0 35.0 54.0 71.7 85.0 96.5 109.0 126.0 142.2 164.0 263.0
Upvotes: 0