Reputation: 351
Suppose I have a data frame that look something like this:
>df
city year ceep
1 1 1
1 2 1
1 3 0
1 4 1
1 5 0
2 1 0
2 2 1
2 3 1
2 4 0
2 5 1
3 1 1
3 2 0
3 3 1
3 4 0
3 5 1
Now I want to create a new variable 'veep' that depends on the values of 'city' and 'ceep' from different rows. For instance,
veep=1 if ceep[_n-1]=1 & city=city[_n-1]
veep=1 if ceep[_n+2]=1 & ceep[_n+3]=1 & city=city[_n+3]
where n
is the row of observation. I'm not sure how to translate these conditions into R language. I guess where I'm having trouble is choosing the row of observation. I'm thinking of a code somewhere along the lines of:
df$veep[df$ceep(of the n-1th observation)==1 & city==city(n-1th observ.)] <- 1
df$veep[df$ceep(of the n+2th observation)==1 & df$ceep(of the n+3th observation)==1 &
city==city(n+3th observ.)] <- 1
#note: what's in parentheses is just to demonstrate where I'm having trouble
Can anyone provide help on this?
Upvotes: 1
Views: 975
Reputation: 19454
Here's a way to write out your logical steps. Note the use of idx
to index the vectors. That was necessary to avoid out-of-range indexes.
idx <- seq_len(nrow(df))
# Set a default value for the new variable
df$veep <- NA
Your first set of logical criteria cannot be applied to the first row of df
, since the index n - 1
would be 0
, and this is not a valid row index. So, use tail(*, -1)
to pick out all but the first entries of veep
and city
and use head(*, -1)
to pick out all but the last of ceep
and city
.
df[tail(idx, -1), "veep"] <- ifelse(
head(df$ceep, -1) == 1 &
tail(df$city, -1) == head(df$city, -1),
1, tail(df$veep, -1))
Your next set of criteria cannot be applied to the last three rows of df
, since n + 3
would then be an invalid index. So use the head
and tail
functions again. One tricky part is the fact that the first ceep
statement is based on n + 2
, not n + 3
, so that a combination of head
and tail
is required.
df[head(idx, -3), "veep"] <- ifelse(
head(tail(df$ceep, -2), -1) == 1 &
tail(df$ceep, -3) == 1 &
head(df$city, -3) == tail(df$city, -3),
1, head(df$veep, -3))
> df$veep
[1] NA 1 1 NA 1 NA NA 1 1 NA NA 1 NA 1 NA
Upvotes: 2
Reputation: 30475
You can use a for loop like this
df$veep <- 0
for (i in seq(nrow(df))){
if (i > 1 & i < nrow(df)-2){
if (df[i-1,"ceep"]==1 & df[i-1,"city"] == df[i,"city"])
df[i,"veep"] <- 1
}
}
Upvotes: 1