Andrew Bolton
Andrew Bolton

Reputation: 93

How to pass bash variable into R script

I have a couple of R scripts that processes data in a particular input folder. I have a few folders I need to run this script on, so I started writing a bash script to loop through these folders and run those R scripts.

I'm not familiar with R at all (the script was written by a previous worker and it's basically a black box for me), and I'm inexperienced with passing variables through scripts, especially involving multiple languages. There's also an issue present when I call source("$SWS_output/Step_1_Setup.R") here - R isn't reading my $SWS_output as a variable, but rather a string.

Here's my bash script:

#!/bin/bash

# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"

# Output
SWS_output="$workspace/7_SKSattempt4_results/"

# create output directory
mkdir -p $SWS_output

# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output

# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
        qdir_name=`basename $qdir`
        echo -e 'source("$SWS_output/Step_1_Setup.R") \n source("$SWS_output/(Step_2_data.R") \n  q()' | R --no-save

done

I need to pass the variable "qdir" into the second R script (Step_2_data.R) to tell it which folder to process.

Thanks!

Upvotes: 2

Views: 7579

Answers (2)

Andrew Bolton
Andrew Bolton

Reputation: 93

Thanks for all the answers they were very helpful. I was able to get a solution that works. Here's my completed script.

#!/bin/bash

# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"

# Output
SWS_output="$workspace/7_SKSattempt4_results"

# create output directory
mkdir -p $SWS_output

# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output

cd $SWS_output

# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
        qdir_name=`basename $qdir`
        echo $qdir_name
        export VARIABLENAME=$qdir
        echo -e 'source("Step_1_Setup.R") \n source("Step_2_Data.R") \n q()' | R --no-save --slave

done

And then the R script looks like this:

qdir<-Sys.getenv("VARIABLENAME")
pathname<-qdir[1]

As a couple of comments have pointed out, this isn't best practice, but this worked exactly as I wanted it to. Thanks!

Upvotes: 3

pauljohn32
pauljohn32

Reputation: 2265

My previous answer was incomplete. Here is a better effort to explain command line parsing.

It is pretty easy to use R's commandArgs function to process command line arguments. I wrote a small tutorial https://gitlab.crmda.ku.edu/crmda/hpcexample/tree/master/Ex51-R-ManySerialJobs. In cluster computing this works very well for us. The whole hpcexample repo is open source/free.

The basic idea is that in the command line you can run R with command line arguments, as in:

R --vanilla -f r-clargs-3.R --args runI=13 parmsC="params.csv" xN=33.45

In this case, my R program is a file r-clargs-3.R and the arguments that the file will import are three space separated elements, runI, parmsC, xN. You can add as many of these space separated parameters as you like. It is completely at your discretion what these are called, but it is required they are separated by spaces and there is NO SPACE around the equal signs. Character string variables should be quoted.

My habit is to name the arguments with suffix "I" to hint that it is an integer, "C" is for character, and "N" is for floating point numbers.

In the file r-clargs-3.R, include some code to read the arguments and sort through them. For example, my tutorial's example

cli <- commandArgs(trailingOnly = TRUE) 
args <- strsplit(cli, "=", fixed = TRUE)

The rest of the work is sorting through the args, and this is my most evolved stanza to sort through arguments (because it looks for suffixes "I", "N", "C", and "L" (for logical)), and then it coerces the inputs to the correct variable types (all input variables are characters, unless we coerce with as.integer(), etc):

for (e in args) {
    argname <- e[1]
    if (! is.na(e[2])) {
        argval <- e[2]
        ## regular expression to delete initial \" and trailing \"
        argval <- gsub("(^\\\"|\\\"$)", "", argval)
    }
    else {
        # If arg specified without value, assume it is bool type and TRUE
        argval <- TRUE
    }

    # Infer type from last character of argname, cast val
    type <- substring(argname, nchar(argname), nchar(argname))
    if (type == "I") {
        argval <- as.integer(argval)
    }
    if (type == "N") {
        argval <- as.numeric(argval)
    }
    if (type == "L") {
        argval <- as.logical(argval)
    }
    assign(argname, argval)
    cat("Assigned", argname, "=", argval, "\n")
}

That will create variables in the R session named paramsC, runI, and xN.

The convenience of this approach is that the same base R code can be run with 100s or 1000s of command parameter variations. Good for Monte Carlo simulation, etc.

Upvotes: 4

Related Questions