Reputation:
When in interactive mode, exploring aggregations in data.table, I may do dozens, or hundreds of experiments. The default column names after aggregations (V1
, V2
, etc) obviously aren't very informative, after writing the minimal amount of code needed to generate the aggregations. Often, I'd be happier with the default column name of a simple aggregation of one column, like a mean or sum, to be just the name of the underlying variable.
All the extra column name typing gets tiring, and I want to avoid this.
Is there any easy way to do this in data.table
?
e.g. a simplified example to demonstrate if it's not clear:
DT = data.table(x =rep(c("b","a","c"),each=3), y_a_long_name=c(1,3,6), v_a_long_name=1:9)
DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]
# x V1 V2
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
When you start working with several columns, using different aggregation type functions, the above labeling, V1
, V2
, isn't helpful.
All the extra typing of repeating the names is tedious, but I'd like to see something like this:
DT[, .(v_a_long_name = sum(v_a_long_name), y_a_long_name = mean(y_a_long_name)), by = x]
# x v_a_long_name y_a_long_name
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
while typing as minimal as possible. e.g. it would be ideal if
DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]
printed this by default:
# x v_a_long_name y_a_long_name
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
Upvotes: 1
Views: 1267
Reputation: 93938
A bit of a roundabout solution, but you can pass a list of functions and a list of columns to Map
+ do.call
. The names for the resulting columns will be taken from the first argument to Map
, which means you only have to specify the names in long-form in .SDcols
once:
DT[,
Map(do.call, args=lapply(.SD,list), what=c(sum,mean)),
by=x, .SDcols=c("v_a_long_name","y_a_long_name")
]
# x v_a_long_name y_a_long_name
#1: b 6 3.333333
#2: a 15 3.333333
#3: c 24 3.333333
If the auto-naming is required, you could define the variables ahead of time:
nms <- c("v_a_long_name","y_a_long_name")
funs <- c("sum","mean")
DT[,
setNames(Map(do.call, args=lapply(.SD,list), what=funs), paste(nms,funs,sep="_")),
by=x, .SDcols=nms
]
# x v_a_long_name_sum y_a_long_name_mean
#1: b 6 3.333333
#2: a 15 3.333333
#3: c 24 3.333333
Upvotes: 0
Reputation: 887651
We can use setNames
to wrap around the list
of columns
DT[, setNames(.(sum(v_a_long_name), mean(y_a_long_name)), names(DT)[2:3]), by = x]
# x y_a_long_name v_a_long_name
#1: b 6 3.333333
#2: a 15 3.333333
#3: c 24 3.333333
Or with setnames
after getting the output
setnames(DT[, .(sum(v_a_long_name), mean(y_a_long_name)),
by = x], 2:3, names(DT)[2:3])[]
Or make it more compact by extracting the columns from .SD
setnames(DT[, .(sum(.SD[[2]]), mean(.SD[[1]])), by = x], 2:3, names(DT)[2:3])[]
Upvotes: 2