shoo
shoo

Reputation: 87

A loop function in R to count the number of points outside the estimatedcurve

How can I make a loop function in R in order to calculate the number of points (in percentage) which fall outside the pink line for each age unit (1-2,2-3,3-4 and,,,,, 18-19)? I mean For-example, I want to see how many points in age interval between 1-2 have a higher value than the estimated pink curve for that specific age interval and then calculate the percentage (the number of points which have a higher value than estimated value divided by the total number of observations for that specific interval)? I need to do it for each one unit age interval (1-2,2-3,3-4,4-5,5-6,6-7,,,,,17-18,18-19).

Forexample:

   Age     Value     estimated Value 
    1.5     12          12
    1.5     12          14
    1.7     13          15
    1.8     14          9 
    2.1     12          15
    2.2     14          16
    2.3     14          13
    3       8           8.1
    4       9           9.1
    4.1     5           6.1
    4.2     5           12
    5       14          15

The result should be something like
Age:                          1-2    2-3    3-4  4-5
number of points *outside*     1      1 
percentage                     1/4    1/3                 

My initial code: (but I need to make it as a loop function in order to have the results for all age units)

a=1
b=2
A<-subset(Data, Age>=a & Age<b)
sum(A$Value > A$EstimatedValue)/nrow(A)

enter image description here

Upvotes: 0

Views: 64

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146100

Using dplyr:

library(dplyr)
dd %>%
  mutate(age_bin = cut(Age, breaks = 0:20)) %>%
  group_by(age_bin) %>%
  summarize(n_points = n(),
            n_over_estimate = sum(Value > estimated_Value),
            pct_over_estimate = n_over_estimate / n_points * 100)
#   age_bin n_points n_over_estimate pct_over_estimate
#   <fct>      <int>           <int>             <dbl>
# 1 (1,2]          4               1                25
# 2 (2,3]          4               1                25
# 3 (3,4]          1               0                 0
# 4 (4,5]          3               0                 0

And this sample data:

dd = read.table(text = "Age     Value     estimated_Value 
    1.5     12          12
    1.5     12          14
    1.7     13          15
    1.8     14          9 
    2.1     12          15
    2.2     14          16
    2.3     14          13
    3       8           8.1
    4       9           9.1
    4.1     5           6.1
    4.2     5           12
    5       14          15", header = TRUE)

Upvotes: 3

Related Questions