s.willis
s.willis

Reputation: 389

How can I use gsub in a user-supplied input to xtable's sanitize.text.function argument?

I have a table of summary statistics created in R. Rows correspond to variables, columns to different samples. I want to export this table to Latex using the xtable package. However, some variables are on a much larger scale, so I would like to round these variables. I have tried creating a user-supplied input to xtable's sanitize.text.function to do this:

dt <- data.table(sample1 = c(1.11, 2222.22), sample2 = c(3.33, 44444.44))  # data for MWE
rownames(dt) <- c('var 1', 'var 2')
xt <- print(xtable(dt),
            format.args=list(big.mark=","),
            sanitize.text.function = \(x) gsub('([0-9]+,[0-9]{3})\\.[0-9]{2}', '\\1', x)) 

# OUTPUT:
# \begin{table}[ht]
# \centering
# \begin{tabular}{rrr}
#   \hline
#  & sample1 & sample2 \\ 
#   \hline
# var 1 & 1.11 & 3.33 \\ 
#   var 2 & 2,222.22 & 44,444.44 \\ 
#    \hline
# \end{tabular}
# \end{table}

However the variables are not rounded in the output table as I would like them to be. Calling the same gsub function on the output of the call to xtable works:

gsub('([0-9]+,[0-9]{3})\\.[0-9]{2}', '\\1', xt)
[1] "\\begin{table}[ht]\n\\centering\n\\begin{tabular}{rrr}\n  \\hline\n & sample1 & sample2 \\\\ \n  \\hline\nvar 1 & 1.11 & 3.33 \\\\ \n  var 2 & 2,222 & 44,444 \\\\ \n   \\hline\n\\end{tabular}\n\\end{table}\n"

So why doesn't this work directly in the call to print.xtable? Having to save the output of call to gsub on the xtable object is a pain.

Bonus: I am trying this approach since, as far as I can see, xtable only allows me to format the number of decimal places for a whole column, rather than a whole row. Any approach that allows me fix the number of decimal places for a row would also solve my problem.

Upvotes: 0

Views: 63

Answers (2)

s.willis
s.willis

Reputation: 389

As pointed out in @ZéLoff's answer, I would need to convert the input columns to character to use sanitize.text.function. An alternative is to pre-process the numeric columns using round, floor, or trunc, depending on the desired output. However, with no further pre-processing, the print.xtable output still includes trailing zeros, even if I have rounded the inputs. One option, as suggested below, is to convert the input table to character. However then I cannot use big.mark for large numbers. Better to use the arguments to formatC to get the processed numeric columns looking as I want in the output table.

In this example, this looks like this:

dt <- data.table(sample1 = c(1.11, 2222.22), sample2 = c(3.33, 44444.44))  # data for MWE
round_cols <- c(1, 2)
dt[2, (round_cols) := lapply(.SD, round), .SDcols = round_cols]
rownames(dt) <- c('var 1', 'var 2')
xt <- print(xtable(dt),
            format.args=list(big.mark = ",",
                             drop0trailing = T),
            comment = F)
\begin{table}[ht]
\centering
\begin{tabular}{rrr}
  \hline
 & sample1 & sample2 \\ 
  \hline
var 1 & 1.11 & 3.33 \\ 
  var 2 & 2,222 & 44,444 \\ 
   \hline
\end{tabular}
\end{table}

Upvotes: 0

Z&#233; Loff
Z&#233; Loff

Reputation: 1712

sanitize.text.function only applies to text, not numbers, so for it to work as you expect, you need to convert the columns to character vectors first. But then you can't use formatting features like big.mark.

Also, relying on string substitution for this isn't rounding, is truncating. You only keep the integer part of the number, e.g., "1000.999999" becomes "1000". Personally, I'd use something like signif, or a more scrupulous use of round before passing the object to xtable:

dt <- data.frame(sample1 = c(1.11, 2222.22), sample2 = c(3.33, 44444.44))  # data for MWE
dt[dt>1000] <- as.character(round(dt[dt>1000], 0))
rownames(dt) <- c('var 1', 'var 2')
xt <- print(xtable(dt),
            format.args=list(big.mark=","))
\begin{table}[ht]
\centering
\begin{tabular}{rll}
  \hline
 & sample1 & sample2 \\
  \hline
var 1 & 1.11 & 3.33 \\
  var 2 & 2222 & 44444 \\
   \hline
\end{tabular}
\end{table}

(note that the code above does not work on data.table, hence me using data.frame).

Nevertheless, as you can see from the example above, either you let xtable treat everything as numbers, in order to use big.mark, and have it introduce decimal places, or you convert all columns to characters and have to format them yourself, prior to feeding the data.frame to xtable. Its "all or nothing" with xtable.

On a slightly unrelated note, the siunitx LaTeX package has some options for tabular material, (including auto rounding and not adding decimal zeros to integers) that you might want to checkout.

Upvotes: 1

Related Questions