kelamahim
kelamahim

Reputation: 587

Removing zeros and adding them back in time series

I have the following data

library(xts)
values<-c(2,2,2,4,2,3,0,0,0,0,0,1,2,3,2)
time1<-seq(from=as.POSIXct("2013-01-01 00:00"),to=as.POSIXct("2013-01-1   14:00"),by="hour")
data<-xts(values,order.by=time1)
data

  [,1]
2013-01-01 00:00:00    2
2013-01-01 01:00:00    2
2013-01-01 02:00:00    2
2013-01-01 03:00:00    4
2013-01-01 04:00:00    2
2013-01-01 05:00:00    3
2013-01-01 06:00:00    0
2013-01-01 07:00:00    0
2013-01-01 08:00:00    0
2013-01-01 09:00:00    0
2013-01-01 10:00:00    0
2013-01-01 11:00:00    1
2013-01-01 12:00:00    2
2013-01-01 13:00:00    3
2013-01-01 14:00:00    2

Now I want to remove all the zeroes, this can be easily achieved with

remove_zerro = apply(data, 1, function(row) all(row !=0 ))
data[remove_zerro,]

The problem is that after I use the data without zeros and make some modifications I want to insert the zeros back to my data at the same date and time. Any idea would be apprecciated

Upvotes: 2

Views: 115

Answers (4)

dk14
dk14

Reputation: 22374

It seems like you might want to work with sparse vectors/matrices:

install.packages("spam")
library(spam)
sx <- c(0,0,3, 3.2, 0,0,0,-3:1,0,0,2,0,0,5,0,0)
apply.spam(spam(sx), NULL, function(x){1 / x})
           [,1]
 [1,]  0.0000000
 [2,]  0.0000000
 [3,]  0.3333333
 [4,]  0.3125000
 [5,]  0.0000000
 [6,]  0.0000000
 [7,]  0.0000000
 [8,] -0.3333333
 [9,] -0.5000000
[10,] -1.0000000
[11,]  0.0000000
[12,]  1.0000000
[13,]  0.0000000
[14,]  0.0000000
[15,]  0.5000000
[16,]  0.0000000
[17,]  0.0000000
[18,]  0.2000000
[19,]  0.0000000
[20,]  0.0000000

If you did it with zero-values:

> apply(matrix(sx), 1, function(x){1 / x})
 [1]        Inf        Inf  0.3333333  0.3125000        Inf        Inf
 [7]        Inf -0.3333333 -0.5000000 -1.0000000        Inf  1.0000000
[13]        Inf        Inf  0.5000000        Inf        Inf  0.2000000
[19]        Inf        Inf

So you can see that apply.spam ignores zeros, but puts them back automatically

The disadvantage is that you'll have to reattach your time-labels back after processing.

Upvotes: 1

kelamahim
kelamahim

Reputation: 587

So obviously this is the solution

no<-data[ data[,1] != 0, ] #data without zeros
yes<-data[ data[,1] == 0, ]# data with only zeros

together<-c(no, yes)# both data combined together

Upvotes: 0

digEmAll
digEmAll

Reputation: 57220

Here are two possible approaches :

# re-create your data set
library(xts)
values<-c(2,2,2,4,2,3,0,0,0,0,0,1,2,3,2)
time1<-seq(from=as.POSIXct("2013-01-01 00:00"),to=as.POSIXct("2013-01-1   14:00"),by="hour")
data<-xts(values,order.by=time1)
data

###############################################
# SOLUTION 1 : 
# make a union of the "zero" series and the "zero-free" series

# create a copy of data with no zero
isNotZero = apply(data, 1, function(row) all(row != 0 ))
zeroFreeSeries <- data[isNotZero,]
zeroSeries <- data[!isNotZero,]

# do you calculations on the "zero-free" series (e.g. add 10 to all values)
zeroFreeSeries <- zeroFreeSeries + 10

# union
unionSeries <- rbind(zeroSeries,zeroFreeSeries)

# now unionSeries contains what you desire
unionSeries

###############################################
# SOLUTION 2 : 
# keep the original series copy and after doing your operations
# on the "zero-free" series, update the original series copy with
# with the new values (it doesn't work well if you remove some date from the 
# zero-free series)

# create a copy of data with no zero
isNotZero = apply(data, 1, function(row) all(row != 0 ))
zeroFreeSeries <- data[isNotZero,]

# do you operations on the "zero-free" series (e.g. add 10 to all values)
zeroFreeSeries <- zeroFreeSeries + 10

# modify the original data by setting the new values
data[time(zeroFreeSeries),] <- zeroFreeSeries

# now data contains what you desire
data

Upvotes: 1

Drj
Drj

Reputation: 1256

I am building on @zx8754's comment.

One way is to split the data frame. If you worry about messing with the indexes or joining the data frames together, then below is an alternate approach.

Create an index of T/F.

idx <- df[,col] != 0
df$col[idx] <- 2007 # or whatever operation. 

Upvotes: 0

Related Questions