Reputation: 87
How can I make a loop function in R in order to calculate the number of points (in percentage) which fall outside the pink line for each age unit (1-2,2-3,3-4 and,,,,, 18-19)? I mean For-example, I want to see how many points in age interval between 1-2 have a higher value than the estimated pink curve for that specific age interval and then calculate the percentage (the number of points which have a higher value than estimated value divided by the total number of observations for that specific interval)? I need to do it for each one unit age interval (1-2,2-3,3-4,4-5,5-6,6-7,,,,,17-18,18-19).
Forexample:
Age Value estimated Value
1.5 12 12
1.5 12 14
1.7 13 15
1.8 14 9
2.1 12 15
2.2 14 16
2.3 14 13
3 8 8.1
4 9 9.1
4.1 5 6.1
4.2 5 12
5 14 15
The result should be something like
Age: 1-2 2-3 3-4 4-5
number of points *outside* 1 1
percentage 1/4 1/3
My initial code: (but I need to make it as a loop function in order to have the results for all age units)
a=1
b=2
A<-subset(Data, Age>=a & Age<b)
sum(A$Value > A$EstimatedValue)/nrow(A)
Upvotes: 0
Views: 64
Reputation: 146100
Using dplyr
:
library(dplyr)
dd %>%
mutate(age_bin = cut(Age, breaks = 0:20)) %>%
group_by(age_bin) %>%
summarize(n_points = n(),
n_over_estimate = sum(Value > estimated_Value),
pct_over_estimate = n_over_estimate / n_points * 100)
# age_bin n_points n_over_estimate pct_over_estimate
# <fct> <int> <int> <dbl>
# 1 (1,2] 4 1 25
# 2 (2,3] 4 1 25
# 3 (3,4] 1 0 0
# 4 (4,5] 3 0 0
And this sample data:
dd = read.table(text = "Age Value estimated_Value
1.5 12 12
1.5 12 14
1.7 13 15
1.8 14 9
2.1 12 15
2.2 14 16
2.3 14 13
3 8 8.1
4 9 9.1
4.1 5 6.1
4.2 5 12
5 14 15", header = TRUE)
Upvotes: 3