I have this column in a dataframe, df: 10 12 75 89 25 100 67 89 4 67 120.2 140.5 170.5 78.1 I am trying to Calculate the 10%,20%,30%,...,90% percentile values of the range of df. Assign a value of 1-10 based on which percentile the values fall in. (eg. if 10 is in the 20%-30% percentile range of df, I assign a value of 2 that is stored in a different column. Basically, I am trying to create an output like this (These values are rough estimates, not based on actual percentiles) 10 2 12 2 75 4 89 5 25 3 100 6 67 4 89 6 4 1 67 5 120.2 7 140.5 8 170.5 10 78.1 5 I don't know how to create these percentiles and assign values. So far I have only been able to do this quantile(df, c(.1, .2,.3,.4,.5,.6,.7,.8,.9)) Could you please help? Thanks

Reputation: 615

How can assign numbers based on percentile in R?

I have this column in a dataframe, df:

I am trying to

Calculate the 10%,20%,30%,...,90% percentile values of the range of df.
Assign a value of 1-10 based on which percentile the values fall in. (eg. if 10 is in the 20%-30% percentile range of df, I assign a value of 2 that is stored in a different column.

Basically, I am trying to create an output like this (These values are rough estimates, not based on actual percentiles)

I don't know how to create these percentiles and assign values.

So far I have only been able to do this

quantile(df, c(.1, .2,.3,.4,.5,.6,.7,.8,.9))

Could you please help?

Thanks

Upvotes: 0

Answers (4)

Laurence_jj

Reputation: 726

Or you could use the data.table way which is much much faster:

library(data.table)
d = data.table(V1 = c(10, 12, 75, 89, 25, 100, 67, 89, 4, 67, 120.2, 140.5, 170.5, 78.1))
d[, percentile:= findInterval(V1, quantile(V1, seq(0,1, by=.1), na.rm=T))]

Upvotes: 0

TooTone

Reputation: 8146

Assuming your values are in x, I would do something like

 floor((x-min(x)) / (max(x)-min(x)) *10)

The fraction (x-min(x)) / (max(x)-min(x)) says how far into the range we are. The denominator (max(x)-min(x)) gets the range, in this case 170.5 - 4 = 166.5. The numerator x-min(x) calculates how far into the range we are.

We then multiply by 10 and round down to get you a number between 0 and 10.

> floor((x-min(x)) / (max(x)-min(x))*10)
 [1]  0  0  4  5  1  5  3  5  0  3  6  8 10  4

This gives different results to what you gave in your example, but with the lowest number being 4, then for example 10 clearly falls into the 0 to 10% category not the 20 to 30% category.

Edit: if you're interested in the order of the values, you can do

floor(rank(x) / length(x) * 10)

rank gives

2.0  3.0  7.0  9.5  4.0 11.0  5.5  9.5  1.0  5.5 12.0 13.0 14.0  8.0

and then you divide by the length (14) and similarly to the above multiply by 10 and use floor

> floor(rank(x) / length(x) * 10)
 [1]  1  2  5  6  2  7  3  6  0  3  8  9 10  5

Upvotes: 1

Matthew Plourde

Reputation: 44614

This is one option:

d <- read.table(text="10
12
75
89
25
100
67
89
4
67
120.2
140.5
170.5
78.1")

transform(d, percentile=findInterval(V1, quantile(V1, seq(0,1, by=.1))))

# HEAD:
# 
#   V1 percentile
# 1 10          1
# 2 12          2
# 3 75          5
# 4 89          7
# 5 25          3
# 
#   ---  4  rows omitted ---
# 
# TAIL:
#       V1 percentile
# 10  67.0          4
# 11 120.2          9
# 12 140.5         10
# 13 170.5         11
# 14  78.1          6

Upvotes: 3

MrFlick

Reputation: 206536

It's somewhat unclear how you were calculating your breaks based on your example, but from your description, this should work

x<-c(10, 12, 75, 89, 25, 100, 67, 89, 4, 67, 120.2, 140.5, 170.5, 78.1)
qq<-cut(x, breaks=seq(min(x),max(x), length.out=11), include.lowest=T, labels=F)

Here we take the vector and split it into 10 groups with seq(min(x),max(x), length.out=11) (thereby using the range of observed values) so the groups are

1: 4-26.65
2: 20.65-37.30
3: 37.30-53.95
....
10: 170.50-153.85

And then we just use the cut command with those breaks to create the new IDs to get

 # cbind(x, qq)
          x qq
 [1,]  10.0  1
 [2,]  12.0  1
 [3,]  75.0  5
 [4,]  89.0  6
 [5,]  25.0  2
 [6,] 100.0  6
 [7,]  67.0  4
 [8,]  89.0  6
 [9,]   4.0  1
[10,]  67.0  4
[11,] 120.2  7
[12,] 140.5  9
[13,] 170.5 10
[14,]  78.1  5

If you actually wanted to use quantiles, you could use

qq<-cut(x, breaks=quantile(x, seq(0,1, length.out=11)), include.lowest=T, labels=F)

which would give

          x qq
 [1,]  10.0  1
 [2,]  12.0  2
 [3,]  75.0  5
 [4,]  89.0  7
 [5,]  25.0  3
 [6,] 100.0  8
 [7,]  67.0  4
 [8,]  89.0  7
 [9,]   4.0  1
[10,]  67.0  4
[11,] 120.2  9
[12,] 140.5 10
[13,] 170.5 10
[14,]  78.1  6

Upvotes: 0

How can assign numbers based on percentile in R?

Answers (4)

Related Questions