Reputation: 1
I have a list of 51 dataframes alltogether 90 MB with the mV measured by some sensors (column 2:7):
str(BoxlistF)
List of 51
$ :'data.frame': 33507 obs. of 7 variables:
..$ Date.Box : POSIXct[1:33507], format: "2013-01-01 00:15:00" ...
..$ Slnt.grn.00x.Box.001.SPADE.1: num [1:33507] 1811 1811 1810 1811 1810 ...
..$ Slnt.grn.00x.Box.001.SPADE.2: num [1:33507] 1739 1739 1737 1737 1736 ...
..$ Slnt.grn.00x.Box.001.SPADE.3: num [1:33507] 1634 1635 1634 1634 1637 ...
..$ Slnt.grn.00x.Box.001.SPADE.4: num [1:33507] 1572 1576 1576 1575 1576 ...
..$ Slnt.grn.00x.Box.001.SPADE.5: num [1:33507] 1660 1660 1659 1660 1659 ...
..$ Slnt.grn.00x.Box.001.SPADE.6: num [1:33507] 1454 1450 1453 1450 1451 ...
To remove measurement errors, I would like to use this ifelse function:
tt<-30
tb=-10
t<-2100
b<-800
l.new1<-lapply(BoxlistF, function(x){ x[,2:7]<-lapply(x[,2:7],function(x) ifelse(x<b|x>t,x<-NA,x))})
another Try was to exclude the Date Column of the function:
lapply(BoxlistF, function(x) ifelse(x[,-1]<b|x[,-1]>t,x[,-1]<-NA,x[,-1]))
So since my column names are varying and complicated I want to address the columns by indexes and not by names.
I dont know if this function works proper since R aborts it complaining about memory issues..:
Error: cannot allocate vector of size 131 Kb
In addition: Warning messages:
1: In ifelse(x[, -1] < b | x[, -1] > t, x[, -1] <- NA, x[, -1]) :
Reached total allocation of 8147Mb: see help(memory.size)
2: In ifelse(x[, -1] < b | x[, -1] > t, x[, -1] <- NA, x[, -1]) :
Reached total allocation of 8147Mb: see help(memory.size)
Called from: top level
Error during wrapup: cannot allocate vector of size 512 Kb
Error during wrapup: target context is not on the stack
In the early stages I had a for loop like this:
for(i in 1:(length(BoxlistF)))
{for(j in 1:6)
{for(e in 1:(nrow(BoxlistF[[i]])))
{
ifelse((!is.na(BoxlistF[[i]][e,1+j])<b|!is.na(BoxlistF[[i]][e,1+j])>t),BoxlistF[[i]][e,1+j]<-NA,BoxlistF[[i]][e,1+j])
ifelse((!is.na(BoxlistT[[i]][e,1+j])<tb|!is.na(BoxlistT[[i]][e,1+j])>tt),BoxlistT[[i]][e,1+j]<-NA,BoxlistT[[i]][e,1+j])
}
}
}
This worked well on monthly data, but on this big data set (one year) I guess its not an option.
About solving memory issues in loops I found this post: Speed up the loop operation in R I tried to follow the instructions and came up with this:
l.new1<-lapply(BoxlistF,function(x)
{ res <- numeric(nrow(x))
for(ff in 2:7)
{
for(cc in 1:nrow(BoxlistF[[1]]))
{ifelse(x[cc,ff]<b|x[cc,ff]>t,res[cc]<-NA,res[cc]<-x[cc,ff])
x[,ff]<-res
return(x)
}}})
that function worked quick, but well dont understand the outcome:
Date.Box Slnt.grn.00x.Box.001.SPADE.1 Slnt.grn.00x.Box.001.SPADE.2 Slnt.grn.00x.Box.001.SPADE.3 Slnt.grn.00x.Box.001.SPADE.4 Slnt.grn.00x.Box.001.SPADE.5 Slnt.grn.00x.Box.001.SPADE.6
1 2013-01-01 00:15:00 1811 1739 1634 1572 1660 1454
2 2013-01-01 01:09:00 0 1739 1635 1576 1660 1450
3 2013-01-01 02:03:00 0 1737 1634 1576 1659 1453
4 2013-01-01 02:57:00 0 1737 1634 1575 1660 1450
5 2013-01-01 03:51:00 0 1736 1637 1576 1659 1451
6 2013-01-01 04:46:00 0 1739 1634 1575 1659 1451
7 2013-01-01 05:40:00 0 1734 1643 1576 1660 1450
8 2013-01-01 06:34:00 0 1734 1634 1576 1660 1449
9 2013-01-01 07:28:00 0 1734 1643 1572 1660 1447
10 2013-01-01 08:22:00 0 1734 1634 1576 1657 1448
I hope my information provided is sufficient, if not tell me and I try to add up missing information.
Upvotes: 0
Views: 58
Reputation: 2384
Don't loop with R. Its not C. This is an example of how to replace certain elements in a data frame effectively.
> testdat <- data.frame(matrix(1:16,ncol=4))
> testdat == 6 | testdat == 9
X1 X2 X3 X4
[1,] FALSE FALSE TRUE FALSE
[2,] FALSE TRUE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE
> testdat[testdat == 6 | testdat == 9] <- 999
> testdat
X1 X2 X3 X4
1 1 5 999 13
2 2 999 10 14
3 3 7 11 15
4 4 8 12 16
Upvotes: 0
Reputation: 1
I cant say that I understand in detail, why this one works but luckily it does:
BoxlistF<-lapply(BoxlistF, function(x) x<-cbind(x["Date.Box"], apply(x[,-1],c(1,2),function(x)ifelse(x<b|x>t,x<-NA,x))))
So if want to check every column and row with an apply loop I have to use apply? or are things like this as well possible with lapply?
Upvotes: 0