Reputation: 7127
I am trying to use lapply
to trim some of my data. What I am trying to do is trim columns 2:4
(deleting the outliers or extreme values) but also remove the rows across the columns.
Some data with outliers in each column. So I want to remove values 100
and -100
in V1
but also remove the whole row in the data. Also removing values 80
and -80
in column V2
- subsequently removing that row also.
trimdata <- NULL
trimdata$ID <- seq.int(102)
trimdata$V1 <- c(rnorm(100), 100, -100)
trimdata$V2 <- c(rnorm(100), 80, -80)
trimdata$V3 <- c(rnorm(100), 120, -120)
trimdata <- as.data.frame(trimdata)
library(DescTools)
trimdata <- lapply(trimdata, function(x) Trim(x, trim = 0.01))
trimdata <- as.data.frame(trimdata)
The above code applies the function across all the columns (removing the extreme values in the ID column)
This code:
trimdata[2:4] <- lapply(trimdata[2:4], function(x) Trim(x, trim = 0.01))
Returns the following error
Error in `[<-.data.frame`(`*tmp*`, 2:4, value = list(V1 = c(0.424725933773568, :
replacement element 1 has 98 rows, need 100
So I am trying to trim based on columns 2:4 but also apply it to column 1.
Upvotes: 0
Views: 1208
Reputation: 5017
You can't replace values in the trimdata
because function Trim
removes elements and you lose the length equality necessary to the substitution.
Here an example:
x <- rnorm(10)
length(x)
[1] 10
length(Trim(x, trim=0.1))
[1] 8
Before Trim
function you have 10 elements, after only 8.
In your example Trim
removes 2 elements, so you have this description in the error:
replacement element 1 has 98 rows, need 100
From Trim
documentation:
A symmetrically trimmed vector x with a fraction of trim observations (resp. the given number) deleted from each end will be returned.
In your example two rows by each column are trimmed out. Rows are differents for each column as you can see:
trim_out<-lapply(trimdata[2:4], function(x) Trim(x, trim = 0.01))
lapply(trim_out, attributes)
$V1
$V1$trim
[1] 56 57
$V2
$V2$trim
[1] 63 47
$V3
$V3$trim
[1] 90 74
If you want a cleaned data.frame in output you can remove all this rows from your dataframe trimdata
, like this:
trimdata[-unique(unlist(lapply(trim_out, attributes))),]
Upvotes: 2