Gedara Home
Gedara Home

Reputation: 47

Error with outlier removal using DataTables

I am trying to remove outliers using the following code

dat_outlier = dat
setDT(dat_outlier)

for (j in col_names){
  
  dat_outlier[, (j):= ifelse(!dat_outlier[[j]] %in% boxplot.stats(dat_outlier[[j]])$out,dat_outlier[[j]],NA), by=Comparison]
  
}

However, I am getting the following error. Please help to figure out the reason and how to correct it.

Error in `[.data.table`(dat_outlier, , `:=`((noquote(j)), ifelse(!dat_outlier[[j]] %in%  : 
  Supplied 62 items to be assigned to group 1 of size 9 in column 'CRP'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

The code was generated by modifying the code mentions on another question thread

Upvotes: 0

Views: 55

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389355

The issue is that dat_outlier[[j]] gives you all the values of the column and not by the group (Comparison). You may try using lapply -

library(data.table)

dat_outlier = dat
setDT(dat_outlier)

dat_outlier[, (j) := lapply(.SD, function(x) ifelse(x %in% boxplot.stats(dat_outlier[[j]])$out, x, NA)), 
            by = Comparison, .SDcols = col_names]

Upvotes: 1

Related Questions