Md Shariful Islam
Md Shariful Islam

Reputation: 71

Survey design related issues in R

I have joined five datasets using full_join function of dplyr package. The first dataset had 6,165 rows; second datasets had 5,827 rows. The final joined dataset has 33,503 rows. I used the following code to join the five datasets.

n2<-full_join(n96, n01)
    n3<-full_join(n2, n06)
    n4<-full_join(n3, n11)
    nf<-full_join(n4, n16)
    View(nf)

The final dataset look like following....

 v000    v005     age  v021  v022  v023    v024    resi  region    v102 education pregnant  v445    v501    v717  wealth occupation marital  wgtv   BMI obov 
  <chr>  <dbl> <dbl+l> <dbl> <dbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+lbl> <dbl+lb> <dbl> <dbl+l> <dbl+l> <dbl+l>  <dbl+lbl> <dbl+l> <dbl> <dbl> <fct>
1 NP3   412612 6 [40-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~  2285 1 [mar~ 4 [agr~ 1 [poo~ 2 [cleric~ 1 [mar~ 0.413  22.8 0    
2 NP3   412612 3 [25-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~  2159 1 [mar~ 4 [agr~ 1 [poo~ 2 [cleric~ 1 [mar~ 0.413  21.6 0    
3 NP3   412612 4 [30-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~  2167 1 [mar~ 4 [agr~ 3 [mid~ 2 [cleric~ 1 [mar~ 0.413  21.7 0    
4 NP3   412612 5 [35-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~  2039 1 [mar~ 4 [agr~ 4 [ric~ 2 [cleric~ 1 [mar~ 0.413  20.4 0    
5 NP3   412612 2 [20-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 1 [prima~ 0 [no o~  2163 1 [mar~ 4 [agr~ 3 [mid~ 2 [cleric~ 1 [mar~ 0.413  21.6 0    
6 NP3   412612 5 [35-~   101    51     0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~  3785 1 [mar~ 4 [agr~ 2 [poo~ 2 [cleric~ 1 [mar~ 0.413  37.8 2    
# ... with 6 more variables: over <fct>, age1 <dbl+lbl>, working_status <dbl+lbl>, education1 <dbl+lbl>, year <dbl>, stra <fct>

As it a complex survey dataset. I used survey design.

svs<-svydesign(id=nf$v021, strata=nf$stra, nest=TRUE, weights=nf$wgtv, data=nf)

It works. During analysis, I found object-related errors. To fix this, I used the following code-

svs1 <- 
  update(
    svs, 
    one=1, 
    edu = factor( education, levels = c(0, 1, 2, 3), labels = 
                    c("no edu", "primary", "secondary", "higher") ),
    
    wealth =factor( wealth, levels = c(1, 2, 3, 4, 5) , labels = 
                      c("poorest", "poorer", "middle", "richer", "richest")),
    marital = factor( marital, levels = c(0, 1) , labels = 
                        c( "never married", "married")),
    occu = factor( occu, levels = c(0, 1, 2, 3) , labels =
                           c( "not working" , "professional/technical/manageral/clerial/sale/services" , "agricultural", "skilled/unskilled manual") ),
    age1 = factor(age1, levels = c(1, 2, 3), labels =
                   c( "early" , "mid", "late") ),
    obov= factor(obov, levels = c(0, 1, 2), labels= 
                      c("normal", "overweight", "obese")),
    
    over= factor(over, levels = c(0, 1), labels= 
                   c("normal", "overweight/obese")),
    
    working_status= factor (working_status, levels = c(0, 1), labels = c("not working", "working")),
    education1= factor (education1, levels = c(0, 1, 2), labels= 
                          c("no education", "primary", "secondary/secondry+")),
    resi= factor (resi, levels= c(0,1), labels= c("urban", "rural"))
  )

Now, I found the following error

Error in `[<-.data.frame`(`*tmp*`, , newnames[j], value = c(3L, 3L, 3L,  : 
  replacement has 12674 rows, data has 33503

Would please suggest how can I fix this error?

Upvotes: 0

Views: 51

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389055

I am not sure how the update function works but it seems you want to change the factor levels of the variables. You can do that in nf dataframe before passing it to svydesign function.

library(dplyr)
nf <- nf %>%
  mutate(edu = factor( education, levels = c(0, 1, 2, 3), labels = 
                c("no edu", "primary", "secondary", "higher") ),
        wealth =factor( wealth, levels = c(1, 2, 3, 4, 5) , labels = 
                  c("poorest", "poorer", "middle", "richer", "richest")),
        marital = factor( marital, levels = c(0, 1) , labels = 
                    c( "never married", "married")),
        occu = factor( occu, levels = c(0, 1, 2, 3) , labels =
                 c( "not working" , "professional/technical/manageral/clerial/sale/services" , "agricultural", "skilled/unskilled manual") ),
        age1 = factor(age1, levels = c(1, 2, 3), labels =
                c( "early" , "mid", "late") ),
        obov= factor(obov, levels = c(0, 1, 2), labels= 
               c("normal", "overweight", "obese")),
        over= factor(over, levels = c(0, 1), labels= 
               c("normal", "overweight/obese")),
       working_status= factor (working_status, levels = c(0, 1), labels = c("not working", "working")),
      education1= factor (education1, levels = c(0, 1, 2), labels= 
                      c("no education", "primary", "secondary/secondry+")),
      resi= factor (resi, levels= c(0,1), labels= c("urban", "rural")))

Upvotes: 1

Related Questions