Mohammed Barakat
Mohammed Barakat

Reputation: 59

Referencing data.table columns in Functions in R

In SAP, I upload text files of transactions that are identified with two known fields: "Document No" and "Position". Multiple transactions in the file can have one UniqueID. Transactions that share the same UniqueID have one Document No, and have serial Positions. Below is a sample of the text file.

UniqueID Document No Position
200 1 1
200 1 2
300 2 1
300 2 2
400 3 1
400 3 2

I managed to write a code to produce this Document No and Position serial for a sample data (tblSAP). The code below works well. However, I need to have this code as a function as I have many processed tables that need to be converted into SAP text files. Hence, I need to convert this code into a function.

library(data.table)
# sample data table
tblSAP <- data.table(
  UniqueID = c(200,200,300,300,400,400)
  
)

# code to generate Document No and Position fields

## Sort entire tblSAP table
tblSAP <- setorder(tblSAP,cols=UniqueID)

## create a temp table of UniqueID 
dfUniqueID <- tblSAP[,.(UniqueID)]

## create "Document No" and "Position" count
DT1 <- dfUniqueID[,.N, by=.(UniqueID, "Document No"=rleid(UniqueID))]

## extend UniqueIDs with Positions
DT2 <- dfUniqueID[,seq_len(.N), by = UniqueID] |> 
  setnames(old = "V1",new = "Position")

## Join UniqueID, Document Number, and Positions in one table
DT3 <- DT1[DT2,on = .(UniqueID)][,-c("N")]

## Bind SAP Upload table with the columns of Document No and Position from DT3 table
tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])

When I converted the code into a function, I only replaced the UniqueID by an input parameter in the function as shown. But it seems there's something wrong in referencing the tblSAP column (UniqueID) in the function. I receive the error of "Error in setorderv(x, cols, order, na.last) : some columns are not in the data.table: IdFld".

Here's the function that I wrote:

# function to generate Document No and Position fields
SAPDocPos <- function(tblSAP,IdFld){
  
  ## Sort entire tblSAP table
  tblSAP <- setorder(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.(IdFld)]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[,.N, by=.(IdFld, "Document No"=rleid(IdFld))]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[,seq_len(.N), by = IdFld] %>% 
    setnames(old = "V1",new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2,on = .(IdFld)][,-c("N")]
  
  ## Bind SAP Upload table with the columns of Document No and Position from DT3 table
  tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])
  return(tblSAP)
}

SAPDocPos(tblSAP,UniqueID)

Assistance in resolving this case is highly appreciated.

Upvotes: 1

Views: 79

Answers (1)

r2evans
r2evans

Reputation: 160687

A few changes:

  • in order to use the symbol UniqueID, we need to use deparse(substitute(.)) to convert it into a character; in general I tend to recommend against using non-standard evaluation (as you are attempting to do here), especially when the only value it adds is the omission of quotes ... however, I'll keep it here for demonstration

  • setorder requires non-standard evaluation, but we can use setorderv which expects a character vector

  • I'll use .SDcols= to subset by columns instead of attempting .(UniqueID)

  • for DT1, the by= can also be a character vector, but that requires that `Document No` be pre-defined, so this is updated a little

  • perhaps just a nuance, I shifted from %>% to |> to reduce dependencies; not required if you already depend on dplyr or magrittr elsewhere

  • just a preference, but the literal return(.) statement is only needed when you need the function to return a value earlier in the code, such as in an if block; since a function always returns the value of the last expression, we can omit it. (Calling return(tblSAP) explicitly is fine but adds unnecessary function calls to the stack.)

SAPDocPos <- function(tblSAP,IdFld){
  IdFld <- deparse(substitute(IdFld))
  
  ## Sort entire tblSAP table
  tblSAP <- setorderv(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
    setnames(old = "V1", new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
  
  ## Bind SAP Upload table with the columns of Document No and Position from DT3 table
  tblSAP <- cbind(tblSAP, DT3[, -c(IdFld), with=FALSE])

  tblSAP
}

SAPDocPos(tblSAP, UniqueID)
#    UniqueID Document No Position
#       <num>       <int>    <int>
# 1:      200           1        1
# 2:      200           1        2
# 3:      300           2        1
# 4:      300           2        2
# 5:      400           3        1
# 6:      400           3        2

If you want to go "all in" on data.table's referential (in-place) semantics, you can add some set calls (more-direct data.table value-setting) to add the columns to the original frame. NOTE: this is in-line with data.table but at odds with base-R's copy-on-write semantics.

SAPDocPos <- function(tblSAP,IdFld){
  IdFld <- deparse(substitute(IdFld))
  
  ## Sort entire tblSAP table
  tblSAP <- setorderv(tblSAP,cols=IdFld)
  
  ## create a temp table of IdFld 
  dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
  
  ## create "Document No" and "Position" count
  DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
  
  ## extend IdFlds with Positions
  DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
    setnames(old = "V1", new = "Position")
  
  ## Join IdFlds, Document Number, and Positions in one table
  DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
  
  set(tblSAP, i = NULL, j = "Document No", value = DT3[["Document No"]])
  set(tblSAP, i = NULL, j = "Position", value = DT3[["Position"]])

  tblSAP
}

tblSAP
#    UniqueID
#       <num>
# 1:      200
# 2:      200
# 3:      300
# 4:      300
# 5:      400
# 6:      400
SAPDocPos(tblSAP, UniqueID) # not re-assigning over tblSAP
tblSAP
#    UniqueID Document No Position
#       <num>       <int>    <int>
# 1:      200           1        1
# 2:      200           1        2
# 3:      300           2        1
# 4:      300           2        2
# 5:      400           3        1
# 6:      400           3        2

Upvotes: 1

Related Questions