Luis
Luis

Reputation: 1584

Improve coding -- using tidyverse to compute confidence intervals (for proportions) and add the results to individual rows

I'm here again to ask the colleagues to help me to improve this code. Since this forum is so receptive, I would like to hear your thoughts about this situation!

The question is pretty simple but needs knowledge about looping / for() / by() etc and I'm almost autodidact when programming in R:

I'm working on a dataset where I have estimates for proprtions and I want to compute the lower and upper boundaries for the confidence interval. The work consists of transforming this:

Original dataset

to this:

enter image description here

There is a package that computes that for me

library(PropCIs)

The solution was intuitive (for me). 1) Get every point estimate,
2) compute its lower and upper boundaries and
3) move to the next row and
4) Return to the first step.

My workaround was based on this post here and is described below step by step. However, I imagine this solution is too slow, naive or works like I'm talking with a strange accent to any native data scientist. Then, I'm wondering if tidyverse could help me to improve that.

library(PropCIs)
library(tidyverse)
set.seed(123)
ds <- data.frame(estimate = runif(15, min=0, max=1), 
                 sample = sample(x = 10:15, 15, replace = T))

ds <- ds %>% mutate(lower = '')  

#looping
for(i in 1:nrow(ds)) { 
  ds$lower[i] <- blakerci(ds$sample[i], 3449, conf.level=0.95)
}

#row to columns
ds <- separate(data = ds, col = lower, into = c("lower", "upper"), sep = ",")

#replace strings
ds <- ds %>% mutate(lower = gsub("c(", "", lower, fixed = TRUE),
                  upper = gsub(")", "", upper, fixed = TRUE))

#Transform to numeric
ds <- ds %>% mutate_at(vars(lower, upper), funs(as.numeric(.)))

As always, thanks much for all support!

Please take into consideration this post has a reproducible script and can help other people! =)

Upvotes: 0

Views: 359

Answers (1)

Jiaxiang
Jiaxiang

Reputation: 883

I don't use PropCIs before, but it seems cool to think in probability. Here is the code in tidyverse style.

library(PropCIs)
library(tidyverse)
library(data.table)
#> 
#> 载入程辑包:'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
set.seed(123)
data.table(
    # data.table is faster.
    estimate = runif(15, min=0, max=1)
    ,sample = sample(x = 10:15, 15, replace = T)
) %>% 
    mutate(
        ci = map(sample
                    ,~blakerci(., 3449, conf.level=0.95) %>% 
                    .$conf.int %>% 
                    .[1:2]
                    # get lower and upper
                    )
        # why n = 3449? Any assumption?
    ) %>% 
    mutate(
        lower = map_dbl(ci,~.[1])
        ,upper = map_dbl(ci,~.[2])
    ) %>% 
    select(-ci)
#>     estimate sample       lower       upper
#> 1  0.2875775     15 0.002476132 0.007123028
#> 2  0.7883051     11 0.001593144 0.005669411
#> 3  0.4089769     10 0.001481219 0.005245599
#> 4  0.8830174     11 0.001593144 0.005669411
#> 5  0.9404673     15 0.002476132 0.007123028
#> 6  0.0455565     15 0.002476132 0.007123028
#> 7  0.5281055     14 0.002350896 0.006691190
#> 8  0.8924190     13 0.002038424 0.006396837
#> 9  0.5514350     15 0.002476132 0.007123028
#> 10 0.4566147     13 0.002038424 0.006396837
#> 11 0.9568333     14 0.002350896 0.006691190
#> 12 0.4533342     13 0.002038424 0.006396837
#> 13 0.6775706     13 0.002038424 0.006396837
#> 14 0.5726334     11 0.001593144 0.005669411
#> 15 0.1029247     10 0.001481219 0.005245599

Created on 2018-11-07 by the reprex package (v0.2.1)

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  zh_CN.UTF-8                 
#>  tz       Asia/Shanghai               
#>  date     2018-11-07
#> Packages -----------------------------------------------------------------
#>  package    * version date       source         
#>  assertthat   0.2.0   2017-04-11 CRAN (R 3.5.0) 
#>  backports    1.1.2   2017-12-13 CRAN (R 3.5.0) 
#>  base       * 3.5.1   2018-07-05 local          
#>  bindr        0.1.1   2018-03-13 CRAN (R 3.5.0) 
#>  bindrcpp   * 0.2.2   2018-03-29 CRAN (R 3.5.0) 
#>  broom        0.5.0   2018-07-17 CRAN (R 3.5.0) 
#>  cellranger   1.1.0   2016-07-27 CRAN (R 3.5.0) 
#>  cli          1.0.0   2017-11-05 CRAN (R 3.5.0) 
#>  colorspace   1.3-2   2016-12-14 CRAN (R 3.5.0) 
#>  compiler     3.5.1   2018-07-05 local          
#>  crayon       1.3.4   2017-09-16 CRAN (R 3.5.0) 
#>  data.table * 1.11.8  2018-09-30 cran (@1.11.8) 
#>  datasets   * 3.5.1   2018-07-05 local          
#>  devtools     1.13.6  2018-06-27 CRAN (R 3.5.0) 
#>  digest       0.6.16  2018-08-22 cran (@0.6.16) 
#>  dplyr      * 0.7.6   2018-06-29 CRAN (R 3.5.1) 
#>  evaluate     0.11    2018-07-17 CRAN (R 3.5.0) 
#>  forcats    * 0.3.0   2018-02-19 CRAN (R 3.5.0) 
#>  ggplot2    * 3.0.0   2018-07-03 CRAN (R 3.5.0) 
#>  glue         1.3.0   2018-07-17 CRAN (R 3.5.0) 
#>  graphics   * 3.5.1   2018-07-05 local          
#>  grDevices  * 3.5.1   2018-07-05 local          
#>  grid         3.5.1   2018-07-05 local          
#>  gtable       0.2.0   2016-02-26 CRAN (R 3.5.0) 
#>  haven        1.1.2   2018-06-27 CRAN (R 3.5.0) 
#>  hms          0.4.2   2018-03-10 CRAN (R 3.5.0) 
#>  htmltools    0.3.6   2017-04-28 CRAN (R 3.5.0) 
#>  httr         1.3.1   2017-08-20 CRAN (R 3.5.0) 
#>  jsonlite     1.5     2017-06-01 CRAN (R 3.5.0) 
#>  knitr        1.20    2018-02-20 CRAN (R 3.5.0) 
#>  lattice      0.20-35 2017-03-25 CRAN (R 3.5.1) 
#>  lazyeval     0.2.1   2017-10-29 CRAN (R 3.5.0) 
#>  lubridate    1.7.4   2018-04-11 CRAN (R 3.5.0) 
#>  magrittr     1.5     2014-11-22 CRAN (R 3.5.0) 
#>  memoise      1.1.0   2017-04-21 CRAN (R 3.5.0) 
#>  methods    * 3.5.1   2018-07-05 local          
#>  modelr       0.1.2   2018-05-11 CRAN (R 3.5.0) 
#>  munsell      0.5.0   2018-06-12 CRAN (R 3.5.0) 
#>  nlme         3.1-137 2018-04-07 CRAN (R 3.5.1) 
#>  pillar       1.3.0   2018-07-14 CRAN (R 3.5.0) 
#>  pkgconfig    2.0.1   2017-03-21 CRAN (R 3.5.0) 
#>  plyr         1.8.4   2016-06-08 CRAN (R 3.5.0) 
#>  PropCIs    * 0.3-0   2018-02-23 CRAN (R 3.5.0) 
#>  purrr      * 0.2.5   2018-05-29 CRAN (R 3.5.0) 
#>  R6           2.3.0   2018-10-04 cran (@2.3.0)  
#>  Rcpp         0.12.19 2018-10-01 cran (@0.12.19)
#>  readr      * 1.1.1   2017-05-16 CRAN (R 3.5.0) 
#>  readxl       1.1.0   2018-04-20 CRAN (R 3.5.0) 
#>  rlang        0.2.2   2018-08-16 cran (@0.2.2)  
#>  rmarkdown    1.10    2018-06-11 CRAN (R 3.5.0) 
#>  rprojroot    1.3-2   2018-01-03 CRAN (R 3.5.0) 
#>  rvest        0.3.2   2016-06-17 CRAN (R 3.5.0) 
#>  scales       1.0.0   2018-08-09 CRAN (R 3.5.0) 
#>  stats      * 3.5.1   2018-07-05 local          
#>  stringi      1.2.4   2018-07-20 CRAN (R 3.5.0) 
#>  stringr    * 1.3.1   2018-05-10 CRAN (R 3.5.0) 
#>  tibble     * 1.4.2   2018-01-22 CRAN (R 3.5.0) 
#>  tidyr      * 0.8.1   2018-05-18 CRAN (R 3.5.0) 
#>  tidyselect   0.2.5   2018-10-11 cran (@0.2.5)  
#>  tidyverse  * 1.2.1   2017-11-14 CRAN (R 3.5.0) 
#>  tools        3.5.1   2018-07-05 local          
#>  utils      * 3.5.1   2018-07-05 local          
#>  withr        2.1.2   2018-03-15 CRAN (R 3.5.0) 
#>  xml2         1.2.0   2018-01-24 CRAN (R 3.5.0) 
#>  yaml         2.2.0   2018-07-25 CRAN (R 3.5.0)

Upvotes: 1

Related Questions