Reputation: 11982
I have the following dataframe df
in R:
time
[1] 0.432
[2] 0.451
[3] 0.399
[4] 0.422
...
[25] 0.444
Now, I would like to add a column to this dataframe (let's call it timep
) of which the elements are calculated by the following formula:
The item on row
i
in columntimep
should be equal to: the number of elements in columntime
that are smaller or equal than the item in columntime
on rowi
, divided by the number of rows of the dataframe.In pseudocode:
df$timep[i] <- count(df$time <= df$time[i])/length(df)
Only, I don't really know how I can correctly express this in R.
Upvotes: 0
Views: 724
Reputation:
R has a built-in empirical cdf ecdf.
Let's say you have a dataframe df
:
df <- data.frame(time = c(0.432, 0.451, 0.399, 0.422, 0.444))
You can create an empirical cdf with:
P <- ecdf(df$time)
Now, if you pass a value to P, it will return the cumulative probabilty for that value:
df$cdf <- P(df$time)
Out:
time cdf
1 0.432 0.6
2 0.451 1.0
3 0.399 0.2
4 0.422 0.4
5 0.444 0.8
Upvotes: 3