Reputation: 71
I have joined five datasets using full_join
function of dplyr
package. The first dataset had 6,165 rows; second datasets had 5,827 rows. The final joined dataset has 33,503 rows.
I used the following code to join the five datasets.
n2<-full_join(n96, n01)
n3<-full_join(n2, n06)
n4<-full_join(n3, n11)
nf<-full_join(n4, n16)
View(nf)
The final dataset look like following....
v000 v005 age v021 v022 v023 v024 resi region v102 education pregnant v445 v501 v717 wealth occupation marital wgtv BMI obov
<chr> <dbl> <dbl+l> <dbl> <dbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+lbl> <dbl+lb> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+lbl> <dbl+l> <dbl> <dbl> <fct>
1 NP3 412612 6 [40-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~ 2285 1 [mar~ 4 [agr~ 1 [poo~ 2 [cleric~ 1 [mar~ 0.413 22.8 0
2 NP3 412612 3 [25-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~ 2159 1 [mar~ 4 [agr~ 1 [poo~ 2 [cleric~ 1 [mar~ 0.413 21.6 0
3 NP3 412612 4 [30-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~ 2167 1 [mar~ 4 [agr~ 3 [mid~ 2 [cleric~ 1 [mar~ 0.413 21.7 0
4 NP3 412612 5 [35-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~ 2039 1 [mar~ 4 [agr~ 4 [ric~ 2 [cleric~ 1 [mar~ 0.413 20.4 0
5 NP3 412612 2 [20-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 1 [prima~ 0 [no o~ 2163 1 [mar~ 4 [agr~ 3 [mid~ 2 [cleric~ 1 [mar~ 0.413 21.6 0
6 NP3 412612 5 [35-~ 101 51 0 1 [pro~ 2 [rur~ 1 [pro~ 2 [rur~ 0 [no ed~ 0 [no o~ 3785 1 [mar~ 4 [agr~ 2 [poo~ 2 [cleric~ 1 [mar~ 0.413 37.8 2
# ... with 6 more variables: over <fct>, age1 <dbl+lbl>, working_status <dbl+lbl>, education1 <dbl+lbl>, year <dbl>, stra <fct>
As it a complex survey dataset. I used survey design.
svs<-svydesign(id=nf$v021, strata=nf$stra, nest=TRUE, weights=nf$wgtv, data=nf)
It works. During analysis, I found object-related errors. To fix this, I used the following code-
svs1 <-
update(
svs,
one=1,
edu = factor( education, levels = c(0, 1, 2, 3), labels =
c("no edu", "primary", "secondary", "higher") ),
wealth =factor( wealth, levels = c(1, 2, 3, 4, 5) , labels =
c("poorest", "poorer", "middle", "richer", "richest")),
marital = factor( marital, levels = c(0, 1) , labels =
c( "never married", "married")),
occu = factor( occu, levels = c(0, 1, 2, 3) , labels =
c( "not working" , "professional/technical/manageral/clerial/sale/services" , "agricultural", "skilled/unskilled manual") ),
age1 = factor(age1, levels = c(1, 2, 3), labels =
c( "early" , "mid", "late") ),
obov= factor(obov, levels = c(0, 1, 2), labels=
c("normal", "overweight", "obese")),
over= factor(over, levels = c(0, 1), labels=
c("normal", "overweight/obese")),
working_status= factor (working_status, levels = c(0, 1), labels = c("not working", "working")),
education1= factor (education1, levels = c(0, 1, 2), labels=
c("no education", "primary", "secondary/secondry+")),
resi= factor (resi, levels= c(0,1), labels= c("urban", "rural"))
)
Now, I found the following error
Error in `[<-.data.frame`(`*tmp*`, , newnames[j], value = c(3L, 3L, 3L, :
replacement has 12674 rows, data has 33503
Would please suggest how can I fix this error?
Upvotes: 0
Views: 51
Reputation: 389055
I am not sure how the update
function works but it seems you want to change the factor levels of the variables. You can do that in nf
dataframe before passing it to svydesign
function.
library(dplyr)
nf <- nf %>%
mutate(edu = factor( education, levels = c(0, 1, 2, 3), labels =
c("no edu", "primary", "secondary", "higher") ),
wealth =factor( wealth, levels = c(1, 2, 3, 4, 5) , labels =
c("poorest", "poorer", "middle", "richer", "richest")),
marital = factor( marital, levels = c(0, 1) , labels =
c( "never married", "married")),
occu = factor( occu, levels = c(0, 1, 2, 3) , labels =
c( "not working" , "professional/technical/manageral/clerial/sale/services" , "agricultural", "skilled/unskilled manual") ),
age1 = factor(age1, levels = c(1, 2, 3), labels =
c( "early" , "mid", "late") ),
obov= factor(obov, levels = c(0, 1, 2), labels=
c("normal", "overweight", "obese")),
over= factor(over, levels = c(0, 1), labels=
c("normal", "overweight/obese")),
working_status= factor (working_status, levels = c(0, 1), labels = c("not working", "working")),
education1= factor (education1, levels = c(0, 1, 2), labels=
c("no education", "primary", "secondary/secondry+")),
resi= factor (resi, levels= c(0,1), labels= c("urban", "rural")))
Upvotes: 1