Iterator
Iterator

Reputation: 20560

Cleaning up function list in an R package with lots of functions

[Revised based on suggestion of exporting names.] I have been working on an R package that is nearing about 100 functions, maybe more.

I want to have, say, 10 visible functions and each may have 10 "invisible" sub-functions.

Is there an easy way to select which functions are visible, and which are not?

Also, in the interest of avoiding 'diff', is there a command like "all.equal" that can be applied to two different packages to see where they differ?

Upvotes: 6

Views: 494

Answers (4)

Andrie
Andrie

Reputation: 179408

The answer is almost certainly to create a package. Some rules of thumb may help in your design choice:

  • A package should solve one problem
  • If you have functions that solve a different problem, put them in a separate package

For example, have a look at the ggplot2 package:

  • ggplot2 is a package that creates wonderful graphics
  • It imports plyr, a package that gives a consistent syntax and approach to solve the Split, Apply, Combine problem
  • It depends on reshape2, a package with only few functions that turns wide data into long, and vice-versa.

The point is that all of these packages were written by a single author, i.e. Hadley Wickham.

If you do decide to make a package, you can control the visibility of your functions:

  • Only functions that are exported are directly visible in the namespace
  • You can additionally mark some functions with the keyword internal, which will prevent them appearing in automatically generated lists of functions.

If you decide to develop your own package, I strongly recommend the devtools package, and reading the devtools wiki

Upvotes: 4

Gavin Simpson
Gavin Simpson

Reputation: 174778

I think you should organise your package and code the way you feel most comfortable with; it is your package after all. NAMESPACE can be used to control what gets exposed or not to the user up-front, as other's have mentioned, and you don't need to document all the functions, just the main user-called functions, by adding \alias{} tags to the Rd files for all the support functions you don't want people to know too much about, or hide them on an package.internals.Rd man page.

That being said, if you want people to help develop your package, or run with it and do amazing things, the better organised it is the easier that job will be. So lay out your functions logically, perhaps one file per function, named after the function name, or group all the related functions into a single R file for example. But be consistent in which approach you do.

If you have generic functions that have more general use, consider splitting those functions out into a separate package that others can use, without having to depend on your mega package with the extra cruft that is more specific. Your package can then depend on this generic package, as can packages of other authors. But don't split packages up just for the sake of making them smaller.

Upvotes: 4

Sacha Epskamp
Sacha Epskamp

Reputation: 47541

You can make a file called NAMESPACE in the base directory of your package. In this you can define which functions you want to export to the user, and you can also import functions from other packages. Exporting will make a function usable, and import will transfer a function from another package to you without making it available to the user (useful if you just need one function and don't want to require your users to load another package when they load yours).

A trunctuated part of my packages NAMESPACE :

useDynLib(qgraph)
export(qgraph)
(...)
importFrom(psych,"principal")
(...)
import(plyr)

which respectively loads the compiled functions, makes the function qgraph() available, imports from psych the principal function and imports from plyr all functions that are exported in plyr's NAMESPACE.

For more details read:

http://cran.r-project.org/doc/manuals/R-exts.pdf

Upvotes: 6

Dirk is no longer here
Dirk is no longer here

Reputation: 368201

If your reformulated question is about 'how to organise large packages', then this may apply:

  • NAMESPACE allows for very fine-grained exporting of functions: your user would see 10 visisble functions

  • even the invisible function are accessible if you or the users 'known', that is done via the ::: triple colon operator

  • packages do come in all sizes and shapes; one common rule about 'when to split' may be that as soon as you have functionality of use in different contexts

As for diff on packages: Huh? Packages are not usually all that close so that one would need a comparison function. The diff command is indeed quite useful on source code. You could use a hash function on binary code if you really wanted to but I am still puzzled as to why one would want to.

Upvotes: 3

Related Questions