dreamwalker
dreamwalker

Reputation: 1743

strata() from "sampling" returns an error: arguments imply differing number of rows

I have a data frame that looks like this:

'data.frame':   1090 obs. of  8 variables:
 $ id            : chr  "INC000000209241" "INC000000218488" "INC000000218982" "INC000000225646" ...
 $ service.type  : chr  "Incident" "Incident" "Incident" "Incident" ...
 $ priority      : chr  "Critical" "Critical" "Critical" "Critical" ...

I order the data as follows:

data <- data[order(data$priority),]

I have been changing priority to factors etc but regardless what I try, when I try running the below:

s = strata(data,c("priority"),size=c(0,0,1,5))

I always get the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 1

I tried debugging the function to see whether I could tell why this error is raised (but I couldn't make sense of the code). The error was raised at this stage of executing the strata() function:

debug: r = cbind(r, i)

Thank you very much for all your help!

Upvotes: 2

Views: 5156

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193637

The problem lies in your trying to set the sample size from some groups equal to zero. Instead, subset your original data before sampling.

Here, we reproduce your problem.

library(sampling)
data(swissmunicipalities)
length(table(swissmunicipalities$REG)) # We have seven strata
# [1] 7

# Let's take two from each group
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = rep(2, 7), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 93     4      93 0.011695906       1
# 145    4     145 0.011695906       1
# 2574   1    2574 0.003395586       2
# 2631   1    2631 0.003395586       2
# 826    3     826 0.006230530       3
# 1614   3    1614 0.006230530       3
# 583    2     583 0.002190581       4
# 1017   2    1017 0.002190581       4
# 1297   5    1297 0.004246285       5
# 2535   5    2535 0.004246285       5
# 342    6     342 0.010752688       6
# 347    6     347 0.010752688       6
# 651    7     651 0.008163265       7
# 2471   7    2471 0.008163265       7

# Let's try to drop the first two groups. Oops...
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = c(0, 0, 2, 2, 2, 2, 2), 
       method="srswor")
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 0, 1

Let's subset and try again.

swiss2 <- swissmunicipalities[!swissmunicipalities$REG %in% c(1, 2), ]
table(swiss2$REG)
strata(swiss2, 
       stratanames = c("REG"), 
       size = c(2, 2, 2, 2, 2), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 58     4      58 0.011695906       1
# 115    4     115 0.011695906       1
# 432    3     432 0.006230530       2
# 986    3     986 0.006230530       2
# 1007   5    1007 0.004246285       3
# 1150   5    1150 0.004246285       3
# 190    6     190 0.010752688       4
# 497    6     497 0.010752688       4
# 1049   7    1049 0.008163265       5
# 1327   7    1327 0.008163265       5

Upvotes: 5

Related Questions