Reputation: 1621
I have the following dataframe in R (only sample data)
data <- data.frame(NAME=c("NAME1", "NAME1", "NAME1","NAME2","NAME2","NAME2"),
ID=c(47,47,47,259,259,259),
SURVEY_YEAR=c(1960,1961,1965,2007,2010,2014),
REFERENCE_YEAR=c(1959,1960,1963,2004,2009,2011),
CUMULATIVE_SUM=c(-6,-10,-23,-9,NA,-40))
In a tabular form it looks like this:
NAME ID SURVEY_YEAR REFERENCE_YEAR CUMULATIVE_SUM
1 NAME1 47 1960 1959 -6
2 NAME1 47 1961 1960 -10
3 NAME1 47 1965 1963 -23
4 NAME2 259 2007 2004 -9
5 NAME2 259 2010 2009 NA
6 NAME2 259 2014 2011 -40
What I am trying to do is to restructure my dataframe so that it should look like this in the end:
NAME ID SURVEY_YEAR REFERENCE_YEAR CUMULATIVE_SUM CUMULATIVE_SUM_REFYEAR
1 NAME1 47 1960 1959 -6 0
2 NAME1 47 1961 1960 -10 -6
3 NAME1 47 1965 1963 -23 -10
4 NAME2 259 2007 2004 -9 0
5 NAME2 259 2010 2009 NA NA
6 NAME2 259 2014 2011 -40 -9
I am trying to achieve this by using the following code:
# loop through elements in data$CUMULATIVE_SUM
for (i in 1:length(data$CUMULATIVE_SUM)) {
# take value of upper row, but take NULL if in upper row there is another NAME or end of table
if (i==1) {
value=0 # If first row
} else {
if (data$NAME[i-1]==data$NAME[i]) {
value=data$CUMULATIVE_SUM[i-1] # Normal case: take upper value
} else {
value=0 # If other NAME
}
}
data$CUMULATIVE_SUM_REFYEAR[i] <- value # Write new value in new column
}
Using this code, my result of the above code looks like this:
NAME ID SURVEY_YEAR REFERENCE_YEAR CUMULATIVE_SUM CUMULATIVE_SUM_REFYEAR
1 NAME1 47 1960 1959 -6 0
2 NAME1 47 1961 1960 -10 -6
3 NAME1 47 1965 1963 -23 -10
4 NAME2 259 2007 2004 -9 0
5 NAME2 259 2010 2009 NA **-9**
6 NAME2 259 2014 2011 -40 NA
As you may have noticed when comparing it with my desired solution the value of -9 is in the wrong place (marked in bold). Is there a way how I could make this exception if there is a NA value in a row? I am stuck. Thank you for your help!
Upvotes: 0
Views: 71
Reputation: 886938
Using dplyr
library(dplyr)
left_join(data, data %>%
group_by(NAME) %>%
filter(!is.na(CUMULATIVE_SUM)) %>%
mutate(CUMULATIVE_SUM_REFYEAR= lag(CUMULATIVE_SUM, 1, 0)))
# NAME ID SURVEY_YEAR REFERENCE_YEAR CUMULATIVE_SUM CUMULATIVE_SUM_REFYEAR
#1 NAME1 47 1960 1959 -6 0
#2 NAME1 47 1961 1960 -10 -6
#3 NAME1 47 1965 1963 -23 -10
#4 NAME2 259 2007 2004 -9 0
#5 NAME2 259 2010 2009 NA NA
#6 NAME2 259 2014 2011 -40 -9
Upvotes: 1
Reputation: 92282
Try
library(data.table)
setDT(data)[!is.na(CUMULATIVE_SUM),
CUMULATIVE_SUM_REFYEAR := c(0, CUMULATIVE_SUM[-.N]),
by = NAME]
data
# NAME ID SURVEY_YEAR REFERENCE_YEAR CUMULATIVE_SUM CUMULATIVE_SUM_REFYEAR
# 1: NAME1 47 1960 1959 -6 0
# 2: NAME1 47 1961 1960 -10 -6
# 3: NAME1 47 1965 1963 -23 -10
# 4: NAME2 259 2007 2004 -9 0
# 5: NAME2 259 2010 2009 NA NA
# 6: NAME2 259 2014 2011 -40 -9
Upvotes: 2