Reputation: 1621
I have a sample dataframe in R that looks like this:
dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
ID=c(33,33,33,33,253,253,253,253),
SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014),
REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
VALUE=c(0,-6,-10,-23,0,-9,NA,-40))
dat
NAME ID SURVEY_YEAR REFERENCE_YEAR VALUE
1 SAMPLE1 33 1959 1959 0
2 SAMPLE1 33 1960 1959 -6
3 SAMPLE1 33 1961 1960 -10
4 SAMPLE1 33 1965 1963 -23
5 SAMPLE2 253 2002 2002 0
6 SAMPLE2 253 2007 2004 -9
7 SAMPLE2 253 2010 2009 NA
8 SAMPLE2 253 2014 2011 -40
What I am trying to do is to expand and transform REFERENCE_YEAR and SURVEY_YEAR into one column YEAR so that the resulting dataframe looks like this:
NAME ID YEAR VALUE
SAMPLE1 33 1959 0 # VALUE from REFERENCE_YEAR 1959
SAMPLE1 33 1959 0 # VALUE from SURVEY_YEAR 1959
--------------------------------------------------------------------------------
SAMPLE1 33 1959 0 # for REFERENCE_YEAR 1959, take previous VALUE
SAMPLE1 33 1960 -6 # VALUE from SURVEY_YEAR 1960
--------------------------------------------------------------------------------
SAMPLE1 33 1960 -6 # for REFERENCE_YEAR 1960, take previous VALUE
SAMPLE1 33 1961 -10 # VALUE from SURVEY_YEAR 1961
--------------------------------------------------------------------------------
SAMPLE1 33 1963 -10 # for REFERENCE_YEAR 1963, take previous VALUE (-10)
SAMPLE1 33 1965 -23 # VALUE from SURVEY_YEAR 1965
--------------------------------------------------------------------------------
SAMPLE2 253 2002 0 # VALUE from REFERENCE_YEAR 2002
SAMPLE2 253 2002 0 # VALUE from SURVEY_YEAR 2002
--------------------------------------------------------------------------------
SAMPLE2 253 2004 0 # for REFERENCE_YEAR 2004, take previous VALUE (0)
SAMPLE2 253 2007 -9 # VALUE taken from SURVEY_YEAR 2007
--------------------------------------------------------------------------------
SAMPLE2 253 2009 NA # if one value is NA in a period (in this case 2009 to 2010), the whole period should be set to NA
SAMPLE2 253 2010 NA
--------------------------------------------------------------------------------
SAMPLE2 253 2011 -9 # for REFERENCE_YEAR 2011, take previous numerical VALUE (not NA,but -9)
SAMPLE2 253 2014 -40 # VALUE taken from SURVEY_YEAR 2014
Is there an easy way to do this?
EDIT: I'd like the data to be in the above structure because I'd like to plot like this (maybe its more understandable with the graphs?). Here NA values have been added where the series is discontinous (1962 in SAMPLE 1 and 2003 and 2008 in SAMPLE2). This is why the structure should be maintained like in the results window above.
Upvotes: 0
Views: 146
Reputation: 908
Fundamentally, your problem is one of assigning values to years using rules. It is not perfectly clear to me what those rules are, but as a start you can do something like this:
dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
ID=c(33,33,33,33,253,253,253,253),
SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014),
REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
VALUE=c(0,-6,-10,-23,0,-9,NA,-40))
uyear=data.frame(UYEAR=unique(c(dat$SURVEY_YEAR,dat$REFERENCE_YEAR)),val=NA)
uyear<-uyear[with(uyear,order(UYEAR)),]
for(i in 1:nrow(uyear)) {
if(uyear$UYEAR[i] %in% dat$SURVEY_YEAR) {
uyear$val[i]=dat$VALUE[which(dat$SURVEY_YEAR==uyear$UYEAR[i])[1]]
}else {uyear$val[i]=dat$VALUE[which(dat$REFERENCE_YEAR==uyear$UYEAR[i])[1]-1]}
}
That said, it is a bad idea to have "YEAR" mean two different things (start and end) without keeping a column that explains which is which.
Upvotes: 1