Jim
Jim

Reputation: 725

loop or apply to generate percentile value in new column for each existing column in df

I'd like to generate a "percentile in the distribution" column for each existing column.

I'm unsure how to generate this percentile column for an individual series, though.

#generate data
df <- data.frame(rnorm(100, 3, 1.2),
                     rnorm(100, 2, 0.5),
                     rnorm(100, 4, 1.5),
                     rnorm(100, 5, 0.2),
                     rnorm(100, 6, 0.7))
    colnames(df) <- c('a', 'b', 'c', 'd', 'e')

#failed attempt to generate new column
df$a_pct <- sapply(df$a, function(x) ecdf(x))

Upvotes: 1

Views: 329

Answers (2)

akrun
akrun

Reputation: 887981

The ecdf of value returns a function.

str(ecdf(df$a))
#function (v)  
#- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
#- attr(*, "call")= language ecdf(df$a)

To get the percentiles, apply the function on the values i.e.

ecdf(df$a)(df$a)

and for multiple columns, loop through the columns with lapply/sapply

res1 <-  sapply(df, function(x) ecdf(x)(x)) 

Upvotes: 1

AdamO
AdamO

Reputation: 4960

Do you have to use ecdf? Just do:

sapply(df, function(x) rowMeans(outer(x, x, `>`)))

Upvotes: 1

Related Questions