user3684014
user3684014

Reputation: 1175

R: What's actually loaded when library() is called?

Here is a snippet of R script doing beta regression on data "GasolineYield":

library("betareg")
data("GasolineYield", package = "betareg")
gy_logit <- betareg(yield ~ batch + temp, data = GasolineYield)

It works fine but if I run the code with the second line deleted, it errs with message:

Error in terms.formula(form, ...) : object 'GasolineYield' not found

But isn't the data.frame GasolineYield in the package betareg ? What's actually happening when I call library("betareg")? Aren't all the data inside the package automatically loaded into current environment? Could anybody help me understand the mechanism behind this?

Upvotes: 1

Views: 88

Answers (1)

farnsy
farnsy

Reputation: 2470

For the most part, data is included in R packages for the purposes of providing examples and other stuff that is not mission critical. That is why datasets are not automatically loaded into the environment for most packages and you have to load them using the data() command. This is a good thing. It would be a waste of memory, time, and namespace for packages that primarily provide functions to load their data all the time when users don't use it very frequently.

When you load a package, only the stuff that is exported in the "NAMESPACE" file by the package designer is made available. And the "DESCRIPTION" file has a field called "LazyData" that determines the data behavior as well. By the way, packages often have functions in them that are for internal use as well and are not exported in the NAMESPACE file.

TL;DR, the package writer determines what stuff will be available when the package is loaded and they specify those items in the NAMESPACE and DESCRIPTION files.

Upvotes: 2

Related Questions