Reputation: 59
In SAP, I upload text files of transactions that are identified with two known fields: "Document No" and "Position". Multiple transactions in the file can have one UniqueID. Transactions that share the same UniqueID have one Document No, and have serial Positions. Below is a sample of the text file.
UniqueID | Document No | Position |
---|---|---|
200 | 1 | 1 |
200 | 1 | 2 |
300 | 2 | 1 |
300 | 2 | 2 |
400 | 3 | 1 |
400 | 3 | 2 |
I managed to write a code to produce this Document No and Position serial for a sample data (tblSAP). The code below works well. However, I need to have this code as a function as I have many processed tables that need to be converted into SAP text files. Hence, I need to convert this code into a function.
library(data.table)
# sample data table
tblSAP <- data.table(
UniqueID = c(200,200,300,300,400,400)
)
# code to generate Document No and Position fields
## Sort entire tblSAP table
tblSAP <- setorder(tblSAP,cols=UniqueID)
## create a temp table of UniqueID
dfUniqueID <- tblSAP[,.(UniqueID)]
## create "Document No" and "Position" count
DT1 <- dfUniqueID[,.N, by=.(UniqueID, "Document No"=rleid(UniqueID))]
## extend UniqueIDs with Positions
DT2 <- dfUniqueID[,seq_len(.N), by = UniqueID] |>
setnames(old = "V1",new = "Position")
## Join UniqueID, Document Number, and Positions in one table
DT3 <- DT1[DT2,on = .(UniqueID)][,-c("N")]
## Bind SAP Upload table with the columns of Document No and Position from DT3 table
tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])
When I converted the code into a function, I only replaced the UniqueID by an input parameter in the function as shown. But it seems there's something wrong in referencing the tblSAP column (UniqueID) in the function. I receive the error of "Error in setorderv(x, cols, order, na.last) : some columns are not in the data.table: IdFld".
Here's the function that I wrote:
# function to generate Document No and Position fields
SAPDocPos <- function(tblSAP,IdFld){
## Sort entire tblSAP table
tblSAP <- setorder(tblSAP,cols=IdFld)
## create a temp table of IdFld
dfUniqueID <- tblSAP[,.(IdFld)]
## create "Document No" and "Position" count
DT1 <- dfUniqueID[,.N, by=.(IdFld, "Document No"=rleid(IdFld))]
## extend IdFlds with Positions
DT2 <- dfUniqueID[,seq_len(.N), by = IdFld] %>%
setnames(old = "V1",new = "Position")
## Join IdFlds, Document Number, and Positions in one table
DT3 <- DT1[DT2,on = .(IdFld)][,-c("N")]
## Bind SAP Upload table with the columns of Document No and Position from DT3 table
tblSAP <- cbind(tblSAP,DT3[,-c("UniqueID")])
return(tblSAP)
}
SAPDocPos(tblSAP,UniqueID)
Assistance in resolving this case is highly appreciated.
Upvotes: 1
Views: 79
Reputation: 160687
A few changes:
in order to use the symbol UniqueID
, we need to use deparse(substitute(.))
to convert it into a character; in general I tend to recommend against using non-standard evaluation (as you are attempting to do here), especially when the only value it adds is the omission of quotes ... however, I'll keep it here for demonstration
setorder
requires non-standard evaluation, but we can use setorderv
which expects a character vector
I'll use .SDcols=
to subset by columns instead of attempting .(UniqueID)
for DT1
, the by=
can also be a character vector, but that requires that `Document No`
be pre-defined, so this is updated a little
perhaps just a nuance, I shifted from %>%
to |>
to reduce dependencies; not required if you already depend on dplyr
or magrittr
elsewhere
just a preference, but the literal return(.)
statement is only needed when you need the function to return a value earlier in the code, such as in an if
block; since a function always returns the value of the last expression, we can omit it. (Calling return(tblSAP)
explicitly is fine but adds unnecessary function calls to the stack.)
SAPDocPos <- function(tblSAP,IdFld){
IdFld <- deparse(substitute(IdFld))
## Sort entire tblSAP table
tblSAP <- setorderv(tblSAP,cols=IdFld)
## create a temp table of IdFld
dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
## create "Document No" and "Position" count
DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
## extend IdFlds with Positions
DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
setnames(old = "V1", new = "Position")
## Join IdFlds, Document Number, and Positions in one table
DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
## Bind SAP Upload table with the columns of Document No and Position from DT3 table
tblSAP <- cbind(tblSAP, DT3[, -c(IdFld), with=FALSE])
tblSAP
}
SAPDocPos(tblSAP, UniqueID)
# UniqueID Document No Position
# <num> <int> <int>
# 1: 200 1 1
# 2: 200 1 2
# 3: 300 2 1
# 4: 300 2 2
# 5: 400 3 1
# 6: 400 3 2
If you want to go "all in" on data.table
's referential (in-place) semantics, you can add some set
calls (more-direct data.table
value-setting) to add the columns to the original frame. NOTE: this is in-line with data.table
but at odds with base-R's copy-on-write semantics.
SAPDocPos <- function(tblSAP,IdFld){
IdFld <- deparse(substitute(IdFld))
## Sort entire tblSAP table
tblSAP <- setorderv(tblSAP,cols=IdFld)
## create a temp table of IdFld
dfUniqueID <- tblSAP[,.SD, .SDcols = IdFld]
## create "Document No" and "Position" count
DT1 <- dfUniqueID[, `Document No` := rleid(get(IdFld))][, .N, by = c(IdFld, "Document No")]
## extend IdFlds with Positions
DT2 <- dfUniqueID[, seq_len(.N), by = IdFld] |>
setnames(old = "V1", new = "Position")
## Join IdFlds, Document Number, and Positions in one table
DT3 <- DT1[DT2, on = c(IdFld)][,-c("N")]
set(tblSAP, i = NULL, j = "Document No", value = DT3[["Document No"]])
set(tblSAP, i = NULL, j = "Position", value = DT3[["Position"]])
tblSAP
}
tblSAP
# UniqueID
# <num>
# 1: 200
# 2: 200
# 3: 300
# 4: 300
# 5: 400
# 6: 400
SAPDocPos(tblSAP, UniqueID) # not re-assigning over tblSAP
tblSAP
# UniqueID Document No Position
# <num> <int> <int>
# 1: 200 1 1
# 2: 200 1 2
# 3: 300 2 1
# 4: 300 2 2
# 5: 400 3 1
# 6: 400 3 2
Upvotes: 1