Reputation: 569
I normally use dataframes for working, but recently trying to get a hang of Datatables for speed purposes. They're incredibly useful for some recent files.
Anyway, I have a function that I use to calculate column variance and remove from my dataframe, once I have already read it in.
rm_invariant_cols = function(df) {
df = df[, sapply(df, function(x) length(unique(x))>1)] ## takes cols with one unique value and drops
return(df)
}
How can I achieve this when df
is a datatable? When I run the same function on a data table I get a collection of logicals for each column rather than the columns themselves.
NB. I have read the vignette, which doesn't seem to cover this.
Thanks,
John
Upvotes: 0
Views: 196
Reputation: 215047
You can use with=FALSE
:
dt <- data.table(A = 1:3, B = c(1,1,1), C = c(2,1,3), D = c(2,2,2))
dt
# A B C D
#1: 1 1 2 2
#2: 2 1 1 2
#3: 3 1 3 2
dt[, sapply(dt, uniqueN) > 1, with=FALSE]
# A C
#1: 1 2
#2: 2 1
#3: 3 3
Or maybe a more idiomatic way as suggested by @thelatemail:
dt[, .SD, .SDcols=lapply(dt, uniqueN) > 1]
# A C
#1: 1 2
#2: 2 1
#3: 3 3
Upvotes: 4