Reputation: 283
I have a data frame that has 2 columns.
column1 has random numbers in column2 is a place holding column for what i want column3 to look like
random temp
0.502423373 1
0.687594055 0
0.741883739 0
0.445364032 0
0.50626137 0.5
0.516364981 0
...
I want to fill column3 so it takes the last non-zero number (1 or .5 in this example) and continuously fills the following rows with that value until it hits a row with a different number. then it repeats the process for the entire column.
random temp state
0.502423373 1 1
0.687594055 0 1
0.741883739 0 1
0.445364032 0 1
0.50626137 0.5 0.5
0.516364981 0 0.5
0.807804708 0 0.5
0.247948445 0 0.5
0.46573337 0 0.5
0.103705154 0 0.5
0.079625868 1 1
0.938928944 0 1
0.677713019 0 1
0.112231619 0 1
0.165907178 0 1
0.836195267 0 1
0.387712998 1 1
0.147737077 0 1
0.439281543 0.5 0.5
0.089013503 0 0.5
0.84174743 0 0.5
0.931738707 0 0.5
0.807955172 1 1
thanks for any and all help
Upvotes: 10
Views: 7799
Reputation: 1
Simply use a loop with a global variable ,
globalvariable used here is m
, r
is a dataframe with two columns A
and B
.
r$B = c(1,NA, NA, NA, 3, NA,6)
m=1
for( i in 1:nrow(r) ){
if(is.na(r$B[i])==FALSE ){
m <<- i # please note the assign sign , " <<- "
next()
} else {
r$B[i] = r$B[m]
}
}
After Execution :
r$B = 1 1 1 1 3 3 6
Upvotes: -1
Reputation: 22293
Inspired by the solution of @Ananda Mahto, this is an adaption of the internal code of na.locf
that works directly with 0
's instead of NA
s. Then you don't need the zoo
package and you don't need to do the preprocessing of changing the values to NA
. Benchmarktests show that this is about 10 times faster than the original version.
locf.0 <- function(x) {
L <- x!=0
idx <- c(0, which(L))[cumsum(L) + 1]
return(x[idx])
}
mydf$state <- locf.0(mydf$temp)
Upvotes: 5
Reputation: 13122
Also, unless I'm overlooking something, this seems to work:
DF$state2 <- ave(DF$temp, cumsum(DF$temp), FUN = function(x) x[x != 0])
DF
# random temp state state2
#1 0.50242337 1.0 1.0 1.0
#2 0.68759406 0.0 1.0 1.0
#3 0.74188374 0.0 1.0 1.0
#4 0.44536403 0.0 1.0 1.0
#5 0.50626137 0.5 0.5 0.5
#6 0.51636498 0.0 0.5 0.5
#7 0.80780471 0.0 0.5 0.5
#8 0.24794844 0.0 0.5 0.5
#9 0.46573337 0.0 0.5 0.5
#10 0.10370515 0.0 0.5 0.5
#11 0.07962587 1.0 1.0 1.0
#12 0.93892894 0.0 1.0 1.0
#13 0.67771302 0.0 1.0 1.0
#14 0.11223162 0.0 1.0 1.0
#15 0.16590718 0.0 1.0 1.0
#16 0.83619527 0.0 1.0 1.0
#17 0.38771300 1.0 1.0 1.0
#18 0.14773708 0.0 1.0 1.0
#19 0.43928154 0.5 0.5 0.5
#20 0.08901350 0.0 0.5 0.5
#21 0.84174743 0.0 0.5 0.5
#22 0.93173871 0.0 0.5 0.5
#23 0.80795517 1.0 1.0 1.0
Upvotes: 3
Reputation: 193517
Perhaps you can make use of na.locf
from the "zoo" package after setting values of "0" to NA
. Assuming your data.frame
is called "mydf":
mydf$state <- mydf$temp
mydf$state[mydf$state == 0] <- NA
library(zoo)
mydf$state <- na.locf(mydf$state)
# random temp state
# 1 0.5024234 1.0 1.0
# 2 0.6875941 0.0 1.0
# 3 0.7418837 0.0 1.0
# 4 0.4453640 0.0 1.0
# 5 0.5062614 0.5 0.5
# 6 0.5163650 0.0 0.5
If there were NA
values in your original data.frame
in the "temp" column, and you wanted to keep them as NA
in the newly generated "state" column too, that's easy to take care of. Just add one more line to reintroduce the NA
values:
mydf$state[is.na(mydf$temp)] <- NA
Upvotes: 12
Reputation: 9687
I suggest using the run length encoding functions, it's a natural way for dealing with steaks in a data set. Using @Kevin's example vector:
temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
y <- rle(temp)
#str(y)
#List of 2
# $ lengths: int [1:11] 1 3 1 5 1 5 1 1 1 3 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"
for( i in seq(y$values)[-1] ) {
if(y$values[i] == 0) {
y$lengths[i-1] = y$lengths[i] + y$lengths[i-1]
y$lengths[i] = 0
}
}
#str(y)
#List of 2
# $ lengths: num [1:11] 4 0 6 0 6 0 2 0 4 0 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"
inverse.rle(y)
# [1] 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.5
# [20] 0.5 0.5 0.5 1.0
Upvotes: 0
Reputation: 6671
Here is an interesting way with the Reduce
function.
temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
fill_zero = function(x,y) if(y==0) x else y
state = Reduce(fill_zero, temp, accumulate=TRUE)
If you're worried about speed, you can try Rcpp.
library(Rcpp)
cppFunction('
NumericVector fill_zeros( NumericVector x ) {
for( int i=1; i<x.size(); i++ )
if( x[i]==0 ) x[i] = x[i-1];
return x;
}
')
state = fill_zeros(temp)
Upvotes: 3
Reputation: 12875
A loop along the following lines should do the trick for you -
for(i in seq(nrow(df)))
{
if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
}
Output -
> df
v1 somedata
1 1 33
2 2 24
3 1 36
4 0 49
5 2 89
6 2 48
7 0 4
8 1 98
9 1 60
10 2 76
>
> for(i in seq(nrow(df)))
+ {
+ if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
+ }
> df
v1 somedata
1 1 33
2 2 24
3 1 36
4 1 49
5 2 89
6 2 48
7 2 4
8 1 98
9 1 60
10 2 76
Upvotes: 0