
Reputation: 193

undefined columns selected in plm() function

I had a weird problem in plm() function. Below is the code:


#Data Generation
n <- 500

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 50
y   <- -100*z+ 1100 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 80
y   <- -80*z+ 1200 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 30
y   <- -120*z+ 1000 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)

# Model 
dtable_p <- pdata.frame(dtable, index = "group")

mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")

Error in [.data.frame(x, , which) : undefined columns selected

I checked all possibilities but I can not figure out why it gives me an error. the columns'names are correct, why R said undefined columns??? Thank you!

Follow up: I add another data set test as the @StupidWolf use to prove

data("Produc", package = "plm")
form <- log(gsp) ~ log(pc) 
Produc$group <-  Produc$region
pProduc <- pdata.frame(Produc, index = "group")

Produc$group <- rep(1:48, each = 17)

summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected

Upvotes: 1

Views: 737

Answers (1)


Reputation: 46908

This is extremely weird, the answer is index must not be named "group".

I suspect somewhere in the plm function, it must be adding a "group" to your data.frame.

We can use the example dataset

data("Produc", package = "plm")
form <- log(gsp) ~ log(pc) 
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected

Using the "region" column from which I copied, it works:

pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))

Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

plm(formula = form, data = pProduc, model = "random")

Unbalanced Panel: n = 9, T = 51-136, N = 816

                  var share
idiosyncratic 0.03691 0.19213 0.402
individual    0.05502 0.23457 0.598
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8861  0.9012  0.9192  0.9157  0.9299  0.9299 

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.68180 -0.11014  0.00977 -0.00039  0.13815  0.45491 

             Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept) -1.099088   0.138395  -7.9417 1.994e-15 ***
log(pc)      1.100102   0.010623 103.5627 < 2.2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    459.71
Residual Sum of Squares: 30.029
R-Squared:      0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16

For your example, just rename the column "group" and also set it as a factor to avoid the other errors. (For "pooling" it should be treated a categorical not numeric).

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))

Upvotes: 2

Related Questions