Reputation: 725
I'd like to generate a "percentile in the distribution" column for each existing column.
I'm unsure how to generate this percentile column for an individual series, though.
#generate data
df <- data.frame(rnorm(100, 3, 1.2),
rnorm(100, 2, 0.5),
rnorm(100, 4, 1.5),
rnorm(100, 5, 0.2),
rnorm(100, 6, 0.7))
colnames(df) <- c('a', 'b', 'c', 'd', 'e')
#failed attempt to generate new column
df$a_pct <- sapply(df$a, function(x) ecdf(x))
Upvotes: 1
Views: 329
Reputation: 887981
The ecdf
of value returns a function.
str(ecdf(df$a))
#function (v)
#- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
#- attr(*, "call")= language ecdf(df$a)
To get the percentiles, apply the function on the values i.e.
ecdf(df$a)(df$a)
and for multiple columns, loop through the columns with lapply/sapply
res1 <- sapply(df, function(x) ecdf(x)(x))
Upvotes: 1
Reputation: 4960
Do you have to use ecdf
? Just do:
sapply(df, function(x) rowMeans(outer(x, x, `>`)))
Upvotes: 1