user1228982
user1228982

Reputation: 105

How to create a column based on values in the rows above/below the row in a different column in R

I have been thinking about this for a while and cannot come up with a solution. I have data in column X that I want to use to create the data in column Z. I want Z to be all 1's up to the point where there are two 0's in a row in X, then all zeros after that. Also, in column W I want the final elements to be 1's when looking at Y from the bottom up, Y contains two 0's in a row. Hope that makes sense. I have put in column Z and column W how they should end up looking. I am trying to use indexing, but I am having a hard time figuring out how to reference the rows from column X that come after the row where the value for Z will be (because the value in row 1 of Z is based on the values of rows 2 and 3 in X). These should be two separate functions, one to look at the beginning and one to look at the end. They will both be aplplied to each row separately, so column X will produce two columns, Z as below, as well as another column which in this case would be all 0's. Thanks for any help!

****** I changed the column names from A B C D to X Y Z W to avoid confusion. Sorry, wasn't thinking of that as I typed it up!

********** I really would like to be able to do this without functions or loops, just using indexing. I think I could figure it out using a function, but since it is a large data set I want it to be as quick as possible.

code    X   Y   Z   W
A   1   0   1   0
A   1   0   1   0
A   0   0   1   0
A   1   0   1   0
A   1   0   1   0
A   1   0   1   0
A   1   0   1   0
A   0   0   1   0
A   1   0   1   0
A   0   0   0   0
A   0   0   0   0
A   1   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   1   1   0   0
B   0   0   0   0
B   1   0   0   0
B   0   0   0   0
B   1   0   0   0
B   0   0   0   0
B   0   0   0   0
B   1   0   0   0
B   0   1   0   0
B   0   0   0   0
B   0   0   0   0
B   0   1   0   1
B   0   1   0   1
B   0   1   0   1
B   0   0   0   1
B   0   1   0   1
B   0   1   0   1

The following function used with aggregate should give the results I am looking for. Thanks to Tyler for beginning the function. I still feel there should be a simpler way to do this, but for now this should do. Thanks to everyone for your input!

I think I got it figured out, based on Tyler's code, just with a few changes. I will just apply this function using aggregate and it should all work out. Thanks for all the input!

pat.finder <- function(var, value=0, fill1=1, fill2=0, rev=FALSE, seq=2){

 if(var[1]==0 & rev==FALSE){

 j<- rep(0,length(var))} else if(var[length(var)]==0 & rev == TRUE){

 j<- rep(0,length(var))} else{

 x <- if(rev) rle(rev(var)) else rle(var)
 n <- which(x[[1]]>(seq-1) & x[[2]]==value)[1]-1
 i <- sum(x[[1]][1:n])
 j <- if(rev){
            rev(c(rep(fill1, i), rep(fill2, length(var)-i)))
       } else {
            c(rep(fill1, i), rep(fill2, length(var)-i))
       }
}

 return(j)
} 

Upvotes: 3

Views: 2406

Answers (4)

baha-kev
baha-kev

Reputation: 3059

This might work for your needs (only does column A). If you can be more specific about what exactly you are looking for, the board can help further.

## read in your data
df1 = read.table(text="code    A   B   C   D 
A   1   0   1   0
A   1   0   1   0
...
")

## create forward-lagged A column
require(taRifx)
df1$lagA = shift(df1$A,wrap=F,pad=T)

myfun1 = function(x,y) {
     BB = x + y
     BB = ifelse(BB > 0, 1, 0)
     BB
}

df1$A2 = apply(df1[,c(2,6)], 1, function(x,y) myfun1(x[1],x[2]))
tvec = rep(1,which(df1$A2 == 0)[1] -1)
bvec = vector(length = nrow(df1) - which(df1$A2 == 0)[1] + 1, mode="numeric")

## the column you are looking for:
df1$nA = c(tvec,bvec)

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269481

Suppose that the data frame shown in the question is DF. Then the ith element of the result of pmax is 0 if the ith and next elements of x are 0 and the ith element of the result is 1 otherwise. We append a 1 on the end since the last element of 'x' has no next element. We then compare that to 0 and cummin then moves the first 0 found by this process onwards.

two0 <- function(x) cummin(c(pmax(x[-1], x[-length(x)]), 1) != 0)
DF.out <- transform(DF, Z = two0(X), W = rev(two0(rev(Y))))

The !=0 makes the result of two0 integer. If we wish we can drop it in which case the result will be numeric.

EDIT: clarified integer/numeric aspect.

Upvotes: 1

Tyler Rinker
Tyler Rinker

Reputation: 109844

There's probably a faster way but this is what I came up with:

dat <- read.table(text="code    A   B   C   D #read in your data
A   1   0   1   0
A   1   0   1   0
A   0   0   1   0
A   1   0   1   0
A   1   0   1   0
A   1   0   1   0
A   1   0   1   0
A   0   0   1   0
A   1   0   1   0
A   0   0   0   0
A   0   0   0   0
A   1   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
A   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   0   0   0   0
B   1   1   0   0
B   0   0   0   0
B   1   0   0   0
B   0   0   0   0
B   1   0   0   0
B   0   0   0   0
B   0   0   0   0
B   1   0   0   0
B   0   1   0   0
B   0   0   0   0
B   0   0   0   0
B   0   1   0   1
B   0   1   0   1
B   0   1   0   1
B   0   0   0   1
B   0   1   0   1
B   0   1   0   1", header=T)

Now the code:

A.rle <- rle(dat$A)
n <- which(A.rle[[1]]>1 & A.rle[[2]]==0)[1]-1
i <- sum(A.rle[[1]][1:n])
dat$C <- c(rep(1, i), rep(0, nrow(dat)-i))

B.rle <- rle(rev(dat$B))
n2 <- which(B.rle[[1]]>1 & B.rle[[2]]==0)[1]-1
i2 <- sum(B.rle[[1]][1:n2])
dat$D <- rev(c(rep(1, i2), rep(0, nrow(dat)-i2)))

EDIT: I don't fully understand what you want I think so I've tried to create a function that is versatile to your needs. Use rev=TRUE to look at the end:

pat.finder <- function(var, value=0, fill1=1, fill2=0, rev=FALSE, seq=2){
    x <- if(rev) rle(rev(var)) else rle(var)
    n <- which(x[[1]]>(seq-1) & x[[2]]==value)[1]-1
    i <- sum(x[[1]][1:n])
    j <- if(rev){
               rev(c(rep(fill1, i), rep(fill2, length(var)-i)))
          } else {
               c(rep(fill1, i), rep(fill2, length(var)-i))
          }
    return(j)
}

#TRY IT OUT
pat.finder(dat$B, rev=TRUE)

transform(dat, C=pat.finder(A), D = pat.finder(B, rev=TRUE)) #what I think you want

transform(dat, C=pat.finder(A, fill1='foo', fill2='bar'), 
    D = pat.finder(A, rev=TRUE))

transform(dat, C=pat.finder(A, value=1), D = pat.finder(B, rev=TRUE))

Upvotes: 1

Carl Witthoft
Carl Witthoft

Reputation: 21492

Consider sum(dat$A[i:(i+1)]) . That is zero iff you have two zeroes in a row. Either use a loop (or lapply) or one of those running functions to find the minimum "i" that returns a zero, and you've found where to "toggle" column C from 1 to zero.

But I really have to ask: "What is the problem you are trying to solve?" I can almost guarantee if you tell us where the data in columns A and B came from, we can show you a much more direct way to identify the breakpoints you are setting up in columns C and D.

PS: once a solution is set up for dat$C, just do the same but looping downwards from "imax" to 1 to get dat$D

Upvotes: 1

Related Questions