Alex Bădoi
Alex Bădoi

Reputation: 830

How to split a row into % deciles?

Up until now i have sorted everything according to the value of my variable so for example if i have a row of n numbers I would have picked the numbers that lie between a and b. What i in fact need to do is find the %a and %b.

I have been using this:

a <- 05
b <- 0.4


    colnames(data[,which(data > a & data < b)])

What i need is to split my row into deciles. So the highest 10% values, then the ones that lie between 10% - 20% and so on up until highest 90% -100%. Values must not overlap withing the deciles and my data does not divide by 10 exactly.

EDIT I have the following chunk of my data:

  dput(data)
structure(list(AN8068571086 = c(0.501692168, 0.197414678, 0.415273482, 
0.3078506, 0.36441391, 0.492483978, 0.398119861, 0.501925374, 
0.660172121, 0.379188187), BMG3223R1088 = c(0.402426587, 0.214836776, 
0.328226835, 0.265325336, 0.25724501, 0.396151915, 0.377199761, 
0.31474308, 0.484177362, 0.412847814), BMG4388N1065 = c(0.592822703, 
0.308105268, 0.374769701, 0.563959456, 0.335778936, 0.455266056, 
0.510205508, 0.384208097, 0.460911179, 0.408350205), BMG6359F1032 = c(0.41153064, 
0.221527294, 0.37383843, 0.329890556, 0.356333922, 0.397373547, 
0.387519253, 0.424925141, 0.578383479, 0.411399158), BMG7496G1033 = c(0.478470376, 
0.222667989, 0.33437412, 0.352835697, 0.299427154, 0.573123951, 
0.466177145, 0.447775951, 0.477199807, 0.514107898), BMG812761002 = c(0.317522103, 
0.265366064, 0.397487594, 0.348840651, 0.428338929, 0.282390173, 
0.571658903, 0.450001013, 0.864445892, 0.418532333), CA88157K1012 = c(0.512859762, 
0.183395043, 0.36847587, 0.364320833, 0.41197194, 0.628829565, 
0.357019295, 0.341567448, 0.536733877, 0.343791549), CH0044328745 = c(0.499076264, 
0.203778437, 0.310663532, 0.288884148, 0.247539664, 0.293768434, 
0.348647329, 0.171457967, 0.391893463, 0.520079294), CH0048265513 = c(0.392308285, 
0.245092722, 0.406807313, 0.338218477, 0.337216158, 0.396477472, 
0.444780447, 0.513073443, 0.5655301, 0.372365682), GB00B4VLR192 = c(0.371059427, 
0.243691452, 0.382559417, 0.36669396, 0.331187524, 0.336644629, 
0.386660867, 0.408767967, 0.570252986, 0.350705351)), .Names = c("AN8068571086", 
"BMG3223R1088", "BMG4388N1065", "BMG6359F1032", "BMG7496G1033", 
"BMG812761002", "CA88157K1012", "CH0044328745", "CH0048265513", 
"GB00B4VLR192"), row.names = c(NA, -10L), class = "data.frame")

The process should work as follows: (1) loop across rows , (2) find lowest 10% values, (3) get colnames of the columns where the 10% lowest values are, and store in a list. The code bellow is what i had before and searches for column names which have a row value that lies between a and b. all that i need is the column names and not the actual values from the row.

stockpicks <- list()
a <- 0.3
b <- 0.7

    for (i in 1:nrow(data)) {


      input <- as.matrix(data[1,])


      #extract colnames of values between a and b

      efficient <- matrix(colnames(data[,which(input > a & input < b)]))


      # make a vector with new name for the output
      tmp_date   <- head(rownames(input), n=1)

      #rename column  
      colnames(efficient) <-tmp_date

      #export to list under new name
      stockpicks[[tmp_date]] <- efficient

    }

Upvotes: 0

Views: 733

Answers (2)

jamieRowen
jamieRowen

Reputation: 1549

To expand on Eric's comment, you could use quantile with cut. For example given a vector of data, or a row of a matrix v you could do something like

    v = rnorm(1000)
    cut(v,breaks = quantile(v,probs = (0:10)/10))

which will give you a factor with 10 levels based on the deciles as break points.

Edit

Based on the updated question you could do something like the following:

    d = as.matrix(data)
    lapply(1:nrow(d), function(i) colnames(d)[d[i,]  < quantile(d[i,],.1)])

You could also use apply on d directly with MARGIN = 1 but this would cause a problem if there was a differing number of values in the bottom 10% in different rows. It works on your minimal example but may not give the expected answer on a larger data frame.

Upvotes: 2

Gopala
Gopala

Reputation: 10483

Here is how you can use quantile to get what you want:

set.seed(0)
x <- as.integer(rnorm(1000, 100, 50))
quantile(x, probs = seq(0, 1, .1))

Output will be:

   0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
-61.0  35.0  54.0  71.7  85.0  96.5 109.0 126.0 142.2 164.0 263.0 

Upvotes: 0

Related Questions