Reputation: 135
I am looking for a way to execute line-by-line statements using dplyr package in R, which is similar to execution by loop: we do something with the next line only when the previous lines are updated.
For instance,
X <- data.frame(a = c(1,NA,NA,NA))
for (i in 2:nrow(X)){
X$a[i] = X$a[i-1] + 1
}
X
a
1 1
2 2
3 3
4 4
So line 3 takes values from previous lines only when line 2 has received value = 2 at the previous loop step.
If I try to do it by usual dplyr::mutate
function then I have
library(dplyr)
X <- data.frame(a = c(1,NA,NA,NA))
X %>% mutate(a = if_else(row_number() == 1, a, lag(a) + 1) )
a
1 1
2 2
3 NA
4 NA
Any ideas how to get the first output using dplyr?
Let me give more specific and complicated example:
> X <- data.frame(date_1 = c("2000-01-01", "2001-01-01", NA, NA, NA, "2007-01-01", NA, NA),
+ date_2 = c("2002-01-01", "2002-01-01", "2002-01-01", "2002-01-01", "2003-01-01", "2008-01-01", "2010-01-01", "2010-01-01"),
+ stringsAsFactors=FALSE)
> X
date_1 date_2
1 2000-01-01 2002-01-01
2 2001-01-01 2002-01-01
3 <NA> 2002-01-01
4 <NA> 2002-01-01
5 <NA> 2003-01-01
6 2007-01-01 2008-01-01
7 <NA> 2010-01-01
8 <NA> 2010-01-01
>
and I want to fill it using the following loop:
> for (i in 2:nrow(X)){
+ X$date_1[i] <- if_else(!is.na(X$date_1[i]), X$date_1[i],
+ if_else(X$date_2[i-1] == X$date_2[i], X$date_1[i-1], X$date_2[i-1]))
+ }
> X
date_1 date_2
1 2000-01-01 2002-01-01
2 2001-01-01 2002-01-01
3 2001-01-01 2002-01-01
4 2001-01-01 2002-01-01
5 2002-01-01 2003-01-01
6 2007-01-01 2008-01-01
7 2008-01-01 2010-01-01
8 2008-01-01 2010-01-01
dplyr
version would look like:
> X %>% mutate( date_1 = if_else(row_number() == 1, date_1,
+ if_else(!is.na(date_1), date_1,
+ if_else(date_2 == lag(date_2), lag(date_1),
+ lag(date_2))))
+ )
date_1 date_2
1 2000-01-01 2002-01-01
2 2001-01-01 2002-01-01
3 2001-01-01 2002-01-01
4 <NA> 2002-01-01
5 2002-01-01 2003-01-01
6 2007-01-01 2008-01-01
7 2008-01-01 2010-01-01
8 <NA> 2010-01-01
Upvotes: 1
Views: 736
Reputation: 13274
Try:
library(tidyverse)
x %>%
fill(a) %>%
mutate(a = a+seq_along(a)-1)
or
x %>%
fill(a) %>%
mutate(a = a+which(!!a)-1)
That should yield:
# a
#1 1
#2 2
#3 3
#4 4
A solution for the latest example:
X <- data.frame(date_1 = c("2000-01-01", "2001-01-01", NA, NA, NA, "2007-01-01", NA, NA),
date_2 = c("2002-01-01", "2002-01-01", "2002-01-01", "2002-01-01", "2003-01-01","2008-01-01", "2010-01-01", "2010-01-01"), stringsAsFactors=FALSE)
X %>%
group_by(date_2) %>%
fill(date_1) %>%
ungroup() %>%
mutate(date_3 = lag(date_2)) %>%
group_by(date_1, date_2) %>%
mutate(date_3 = if_else(is.na(date_1), head(date_3,1), date_3)) %>%
ungroup() %>%
mutate(date_1 = if_else(is.na(date_1), date_3, date_1)) %>%
select(date_1, date_2)
Output:
date_1 date_2
2000-01-01 2002-01-01
2001-01-01 2002-01-01
2001-01-01 2002-01-01
2001-01-01 2002-01-01
2002-01-01 2003-01-01
2007-01-01 2008-01-01
2008-01-01 2010-01-01
2008-01-01 2010-01-01
I hope this helps.
Upvotes: 1