JohnL_10
JohnL_10

Reputation: 569

R: How to select columns in a data table depending on column variance?

I normally use dataframes for working, but recently trying to get a hang of Datatables for speed purposes. They're incredibly useful for some recent files.

Anyway, I have a function that I use to calculate column variance and remove from my dataframe, once I have already read it in.

rm_invariant_cols = function(df) {

     df = df[, sapply(df, function(x) length(unique(x))>1)]   ## takes cols with one unique value and drops
     return(df)

}

How can I achieve this when df is a datatable? When I run the same function on a data table I get a collection of logicals for each column rather than the columns themselves.

NB. I have read the vignette, which doesn't seem to cover this.

Thanks,

John

Upvotes: 0

Views: 196

Answers (1)

akuiper
akuiper

Reputation: 215047

You can use with=FALSE:

dt <- data.table(A = 1:3, B = c(1,1,1), C = c(2,1,3), D = c(2,2,2))

dt
#   A B C D
#1: 1 1 2 2
#2: 2 1 1 2
#3: 3 1 3 2

dt[, sapply(dt, uniqueN) > 1, with=FALSE]
#   A C
#1: 1 2
#2: 2 1
#3: 3 3

Or maybe a more idiomatic way as suggested by @thelatemail:

dt[, .SD, .SDcols=lapply(dt, uniqueN) > 1]

#   A C
#1: 1 2
#2: 2 1
#3: 3 3

Upvotes: 4

Related Questions