data.table: transforming subset of columns with a function, row by row

Question

How can one, having a data.table with mostly numeric values, transform just a subset of columns and put them back to the original data table? Generally, I don't want to add any summary statistic as a separate column, just exchange the transformed ones.

Assume we have a DT. It has 1 column with names and 10 columns with numeric values. I am interested in using "scale" function of base R for each row of that data table, but only applied to those 10 numeric columns.

And to expand on this. What if I have a data table with more columns and I need to use column names to tell the scale function on which datapoints to apply the function?

With regular data.frame I would just do:

df[,grep("keyword",colnames(df))] <- t(apply(df[,grep("keyword",colnames(df))],1,scale))

I know this looks cumbersome but always worked for me. However, I can't figure out a simple way to do it in data.tables.

I would image something like this to work for data.tables:

dt[,grep("keyword",colnames(dt)) := scale(grep("keyword",colnames(dt)),center=F)]

But it doesn't.

EDIT:

Another example of doing that updating columns with their per-row-scaled version:

dt = data.table object

dt[,grep("keyword",colnames(dt),value=T) := as.data.table(t(apply(dt[,grep("keyword",colnames(dt)),with=F],1,scale)))]

Too bad it needs the "as.data.table" part inside, as the transposed value from apply function is a matrix. Maybe data.table should automatically coerce matrices into data.tables upon updating of columns?

Cath · Accepted Answer

If what you need is really to scale by row, you can try doing it in 2 steps:

# compute mean/sd:
mean_sd <- DT[, .(mean(unlist(.SD)), sd(unlist(.SD))), by=1:nrow(DT), .SDcols=grep("keyword",colnames(DT))]

# scale
DT[, grep("keyword",colnames(DT), value=TRUE) := lapply(.SD, function(x) (x-mean_sd$V1)/mean_sd$V2), .SDcols=grep("keyword",colnames(DT))]

data.table: transforming subset of columns with a function, row by row

Answers (2)

PART 1: The one line solution you requested:

PART 2: A Step-by-Step Solution: (more general and easier to follow)

Here's the step-by-step way of doing the same:

Get the data into Data.Table format:

Then, Handle the Column Names:

Define the function you want to apply

After that, it is trivial in Data.Table syntax:

Verify:

Related Questions