user7613376
user7613376

Reputation:

Changing default column names in R data.table aggregations

When in interactive mode, exploring aggregations in data.table, I may do dozens, or hundreds of experiments. The default column names after aggregations (V1, V2, etc) obviously aren't very informative, after writing the minimal amount of code needed to generate the aggregations. Often, I'd be happier with the default column name of a simple aggregation of one column, like a mean or sum, to be just the name of the underlying variable.

All the extra column name typing gets tiring, and I want to avoid this.

Is there any easy way to do this in data.table?

e.g. a simplified example to demonstrate if it's not clear:

DT = data.table(x =rep(c("b","a","c"),each=3), y_a_long_name=c(1,3,6), v_a_long_name=1:9)

DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]
#    x V1       V2
# 1: b  6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333

When you start working with several columns, using different aggregation type functions, the above labeling, V1, V2, isn't helpful.

All the extra typing of repeating the names is tedious, but I'd like to see something like this:

DT[, .(v_a_long_name = sum(v_a_long_name), y_a_long_name = mean(y_a_long_name)), by = x]
#    x v_a_long_name y_a_long_name
# 1: b             6      3.333333
# 2: a            15      3.333333
# 3: c            24      3.333333

while typing as minimal as possible. e.g. it would be ideal if

DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]

printed this by default:

#    x v_a_long_name y_a_long_name
# 1: b             6      3.333333
# 2: a            15      3.333333
# 3: c            24      3.333333

Upvotes: 1

Views: 1267

Answers (2)

thelatemail
thelatemail

Reputation: 93938

A bit of a roundabout solution, but you can pass a list of functions and a list of columns to Map + do.call. The names for the resulting columns will be taken from the first argument to Map, which means you only have to specify the names in long-form in .SDcols once:

DT[, 
  Map(do.call, args=lapply(.SD,list), what=c(sum,mean)),
  by=x, .SDcols=c("v_a_long_name","y_a_long_name")
]

#   x v_a_long_name y_a_long_name
#1: b             6      3.333333
#2: a            15      3.333333
#3: c            24      3.333333

If the auto-naming is required, you could define the variables ahead of time:

nms  <- c("v_a_long_name","y_a_long_name")
funs <- c("sum","mean")

DT[, 
  setNames(Map(do.call, args=lapply(.SD,list), what=funs), paste(nms,funs,sep="_")),
  by=x, .SDcols=nms
]

#   x v_a_long_name_sum y_a_long_name_mean
#1: b                 6           3.333333
#2: a                15           3.333333
#3: c                24           3.333333

Upvotes: 0

akrun
akrun

Reputation: 887651

We can use setNames to wrap around the list of columns

DT[, setNames(.(sum(v_a_long_name), mean(y_a_long_name)), names(DT)[2:3]), by = x]
#   x y_a_long_name v_a_long_name
#1: b             6      3.333333
#2: a            15      3.333333
#3: c            24      3.333333

Or with setnames after getting the output

setnames(DT[, .(sum(v_a_long_name), mean(y_a_long_name)),
                         by = x], 2:3, names(DT)[2:3])[]

Or make it more compact by extracting the columns from .SD

setnames(DT[, .(sum(.SD[[2]]), mean(.SD[[1]])), by = x], 2:3, names(DT)[2:3])[]

Upvotes: 2

Related Questions