Florian Oswald
Florian Oswald

Reputation: 5134

R exams package weird behaviour with dplyr

I have noted to strange behaviour in the R exams package when I load the dplyr library. the below example only works if I explicitly call the dplyr namespace, as indicated in the comments. notice that the error only occurs in a fresh session, i.e. you need to restart R in order to see what I see. You need to place the below in a file exam.Rmd, then call

library(exams)
library(dplyr)
exams2html("exam.Rmd")  # in pwd

# this is exam.Rmd
```{r datagen,echo=FALSE,results='hide',warning=FALSE,message=FALSE}
df = data.frame(i = 1:4, y = 1:4, group = paste0("g",rep(1:2,2)))
# works:
b2 = diff(dplyr::filter(df,group!="g1")$y)
b3 = diff(dplyr::filter(df,group!="g2")$y)
# messes up the complete exercise:
# b2 = diff(filter(df,group!="g1")$y)
# b3 = diff(filter(df,group!="g2")$y)
nq = 2
questions <- solutions <- explanations <- rep(list(""), nq)
type <- rep(list("num"),nq)

questions[[1]] = "What is the value of $b_2$ rounded to 3 digits?"
questions[[2]] = "What is the value of $b_3$ rounded to 3 digits?"
solutions[[1]] = b2
solutions[[2]] = b3
explanations[[1]] = paste("You have you substract the conditional mean of group 2 from the reference group 1. gives:",b2)
explanations[[2]] = paste("You have you substract the conditional mean of group 3 from the reference group 1",b3)
```


Question
========
You are given the following dataset on two variables `y` and `group`. 

```{r showdata,echo=FALSE}
# kable(df,row.names = FALSE,align = "c")
df
```

some text with math

$y_i = b_0 + b_2 g_{2,i}  + b_3 g_{3,i} + e_i$

```{r questionlist, echo = FALSE, results = "asis"}
answerlist(unlist(questions), markup = "markdown")
```

Solution
========

```{r sollist, echo = FALSE, results = "asis"}
answerlist(unlist(explanations), markup = "markdown")
```

Meta-information
================
extype: cloze
exsolution: `r paste(solutions,collapse = "|")`
exclozetype: `r paste(type, collapse = "|")`
exname: Dummy Manual computation
extol: 0.001

Upvotes: 1

Views: 231

Answers (2)

Achim Zeileis
Achim Zeileis

Reputation: 17183

Thanks for raising this issue and to @hrbrmstr for explanation of one part of the problem. However, one part of the explanation is still missing:

  • Of course, the root of the problem is that both stats and dplyr export different filter() functions. And it can depend on various factors which function is found first.
  • In an interactive session it is sufficient to load the packages in the right order with stats being loaded automatically and dplyr subsequently. Hence this works:
    library("knitr")
    library("dplyr")
    knit("exam.Rmd")
  • It took me a moment to figure out what is different when you do:
    library("exams")
    library("dplyr")
    exams2html("exam.Rmd")
  • It turns out that in the latter code chunk knit() is called by exams2html() and hence the NAMESPACE of the exams package changes the search path because it fully imports the entire stats package. Therefore, stats::filter() is found before dplyr::filter() unless the code is evaluated in an environment where dplyr was loaded such as the .GlobalEnv. (For more details see the answer by @hrbrmstr)

As there is no pressing reason for the exams package to import the entire stats package, I have changed the NAMESPACE to import only the required functions selectively (which does not include the filter() function). Please install the development version from R-Forge:

install.packages("exams", repos = "http://R-Forge.R-project.org")

And then your .Rmd can be compiled without dplyr::... just by including library("dplyr") - either within the .Rmd or before calling exams2html(). Both should work now as expected.

Upvotes: 5

hrbrmstr
hrbrmstr

Reputation: 78792

Using your exams.Rmd, this is the source pane where I'm about to hit cmd-enter:

enter image description here

(I added quiet=FALSE so I could see what was going on).

Here's the console output after cmd-enter:

enter image description here

And here's the output:

enter image description here

If you read all the way through to the help on knit:

  • envir: Environment in which code chunks are to be evaluated, for example, parent.frame(), new.env(), or globalenv()).

So parent.frame() or globalenv() is required vs what you did (you don't seem to fully understand environments). You get TRUE from your exists() call because by default inherits is TRUE in the exists function and that tells the function to "[search] the enclosing frames of the environment" (from the help on exists.

And, you should care deeply about source code and triaging errors. You're using a programming language and open source software and you are right that the library(dplyr) didn't work inside the Rmd due to some terrible code choices in this "great" package and that you don't want pointed out since you don't want to look at source code.

End, as I can do no more for you. I just hope others benefit from this.

Upvotes: 1

Related Questions