JNevens
JNevens

Reputation: 11982

Create cumulative probability density function

I have the following dataframe df in R:

      time
[1]  0.432
[2]  0.451
[3]  0.399
[4]  0.422
...
[25] 0.444

Now, I would like to add a column to this dataframe (let's call it timep) of which the elements are calculated by the following formula:

The item on row i in column timep should be equal to: the number of elements in column time that are smaller or equal than the item in column time on row i, divided by the number of rows of the dataframe.

In pseudocode: df$timep[i] <- count(df$time <= df$time[i])/length(df)

Only, I don't really know how I can correctly express this in R.

Upvotes: 0

Views: 724

Answers (1)

user2285236
user2285236

Reputation:

R has a built-in empirical cdf ecdf.

Let's say you have a dataframe df:

df <- data.frame(time = c(0.432, 0.451, 0.399, 0.422, 0.444))

You can create an empirical cdf with:

P <- ecdf(df$time)

Now, if you pass a value to P, it will return the cumulative probabilty for that value:

df$cdf <- P(df$time)

Out:

   time cdf
1 0.432 0.6
2 0.451 1.0
3 0.399 0.2
4 0.422 0.4
5 0.444 0.8

Upvotes: 3

Related Questions