Reputation: 7
As a relative beginner to R i am having difficulties. My goal is to bootstrap the individual coefficient of variation and to print that to a new dataframe for further calculations and analysis, eg 1000 bootstraped CVs for each individual based on their own variation in the data. Here is how far I got before I ran into a problem i fail to solve. I have tried to find a solution online including a search here but I fail to find or understand that I have found a solution it even if it most probably is there somewhere. If so please point me towards that direction.
I have a dataset with repeated observations on several individuals, but they do not have the same length of observation as seen in the data below
Subject.id Moderate
1 943
1 1132
1 347
1 1100
1 1265
2 1297
2 888
2 1005
2 1211
2 1338
2 1238
2 916
2 541
2 613
2 692
2 1538
2 1071
3 670
3 864
3 1189
3 320
I'm trying to bootstrap, using the boot package, the within individual coefficient of variation. My boot function looks like this:
boot.f<-function(d, i){
d2 <- d[i,]
return(sqrt(var(d2$moderate))/mean(d2$moderate))
}
And it runs perfectly fine like this:
boot1<-boot(df, boot.f, 1000)
However, when I try and use the strata argument like this:
boot1<-boot(df, boot.f, 1000, strata=subject.id)
I get the following error message:
Error in tapply(seq_len(n), as.numeric(strata)) : arguments must have same length In addition: Warning message: In tapply(seq_len(n), as.numeric(strata)) : NAs introduced by coercion
So my question is how can I tweak my function so that I can preserve the within subject information and in the end get an output looking something like when I used the summaryBy function, exept times a thousand? summaryBy(moderate~subject_id, data=df, FUN=CV)
subject.id moderate.CV
1 2001 0.3831299
2 2002 0.4972260
3 2003 0.5095434
4 2004 0.2730478
5 2005 0.3645640
6 2006 0.3727822
7 2007 0.3858968
8 2008 0.5833114
9 2009 0.5896946
10 2013 0.4247119
11 2014 0.3016552
12 2015 0.4670444
13 2016 0.3995908
14 2018 0.3908963
15 2019 0.3660683
16 2020 0.3373719
17 2022 0.5020418
18 2023 0.3848056
19 2024 0.6410266
20 2025 0.7070671
21 2026 0.3925212
22 2028 0.1879174
23 2029 0.2912984
24 2030 0.3534441
25 2031 0.2238960
26 2032 0.7491192
27 2033 0.5775261
Upvotes: 0
Views: 694
Reputation: 3888
I have no problem running the following:
library(boot)
df<-read.table(path.to.your.data)
boot.f<-function(d, i){
d2 <- d[i,]
return(sqrt(var(d2$moderate))/mean(d2$moderate))
}
boot(df, boot.f, 1000)
boot(df, boot.f, 1000, strata=df$subject.id)
variable names (since you change between upper- and lowercase letters):
head(df,3)
subject.id moderate
1 1 943
2 1 1132
3 1 347
Upvotes: 1