jangorecki
jangorecki

Reputation: 16697

Extract comments from R source files, keep function in which they occurs

I would like to extract comments (matching to patterns) from my R source script keeping the functions in which they occurs.

The goal is to write documentation comments inside function body code using classic markdown checkboxes - [ ] or - [x] and extract those comments for further processing as list of character vectors - which I can easily write to new .md file.

Reproducible example and expected output below.

# preview the 'data'
script_body = c('# some init comment - not matching pattern','g = function(){','# - [x] comment_g1','# - [ ] comment_g2','1','}','f = function(){','# - [ ] comment_f1','# another non match to pattern','g()+1','}')
cat(script_body, sep = "\n")
# # some init comment - not matching pattern
# g = function(){
#     # - [x] comment_g1
#     # - [ ] comment_g2
#     1
# }
# f = function(){
#     # - [ ] comment_f1
#     # another non match to pattern
#     g()+1
# }

# populate R souce file
writeLines(script_body, "test.R")

# test it 
source("test.R")
f()
# [1] 2

# expected output
r = magic_function_get_comments("test.R", starts.with = c(" - [x] "," - [ ] "))
# r = list("g" = c(" - [x] comment_g1"," - [ ] comment_g2"), "f" = " - [ ] comment_f1")
str(r)
# List of 2
#  $ g: chr [1:2] " - [x] comment_g1" " - [ ] comment_g2"
#  $ f: chr " - [ ] comment_f1"

Upvotes: 4

Views: 1074

Answers (2)

hrbrmstr
hrbrmstr

Reputation: 78792

It's unlikely anyone is going to write the grep / stringr::str_match part for you (this isn't a grunt code-writing service). But, the idiom for iterating over parsed function source might be useful enough to a broader audience to warrant inclusion.

CAVEAT This source()s the .R file, meaning it evaluates it.

#' Extract whole comment lines from an R source file
#'
#' \code{source()} an R source file into a temporary environment then
#' iterate over the source of \code{function}s in that environment and
#' \code{grep} out the whole line comments which can then be further
#' processed.
#' 
#' @param source_file path name to source file that \code{source()} will accept
extract_comments <- function(source_file) {

  tmp_env <- new.env(parent=sys.frame())
  source(source_file, tmp_env, echo=FALSE, print.eval=FALSE, verbose=FALSE, keep.source=TRUE)
  funs <- Filter(is.function, sapply(ls(tmp_env), get, tmp_env))

  lapply(funs, function(f) {
    # get function source
    function_source <- capture.output(f)
    # only get whole line comment lines
    comments <- grep("^[[:blank:]]*#", function_source, value=TRUE)
    # INCANT YOUR GREP/REGEX MAGIC HERE 
    # instead of just returning the comments
    # since this isn't a free code-writing service
    comments
  })

}

str(extract_comments("test.R"))
## List of 2
##  $ f: chr [1:2] "# - [ ] comment_f1" "# another non match to pattern"
##  $ g: chr [1:2] "# - [x] comment_g1" "# - [ ] comment_g2"

Upvotes: 2

Konrad Rudolph
Konrad Rudolph

Reputation: 545588

Here’s a stripped-down, unevaluated variant of what hrbmstr has done:

get_comments = function (filename) {
    is_assign = function (expr)
        as.character(expr) %in% c('<-', '<<-', '=', 'assign')

    is_function = function (expr)
        is.call(expr) && is_assign(expr[[1]]) && is.call(expr[[3]]) && expr[[3]][[1]] == quote(`function`)

    source = parse(filename, keep.source = TRUE)
    functions = Filter(is_function, source)
    fun_names = as.character(lapply(functions, `[[`, 2))
    setNames(lapply(attr(functions, 'srcref'), grep,
                    pattern = '^\\s*#', value = TRUE), fun_names)
}

This comes with a caveat: since we don’t evaluate the source, we may miss function definitions (for instance, we wouldn’t find f = local(function (x) x)). The above function uses a simple heuristic to find function definitions (it looks at all simple assignments of a function expression to a variable).

This can only be fixed using eval (or source), which comes with its own caveats — for instance, it’s a security risk to execute files from an unknown source.

Upvotes: 4

Related Questions