Reputation: 2134
I am trying to get the maximum value in the column event
until an agreement
(dummy) is reached; Events are nested in agreements, agreements are nested in dyad
which run over year
. Note that years are not always continuous, meaning there are breaks between the years (1986, 1987,2001,2002).
I am able to get the maximum values within the dyad with a ddply
and max(event); but I struggle how to ‘assign’ the different events to the right agreement (until/after). I am basically lacking an 'identifier' which assigns each observation to an agreement.
The results which I am looking for are already in the column "result".
dyad year event agreement agreement.name result
1 1985 9
1 1986 4 1 agreement1 9
1 1987
1 2001 3
1 2002 1 agreement2 3
2 1999 1
2 2000 5
2 2001 1 agreement3 5
2 2002 2
2 2003
2 2004 1 agreement 4 2
Here is the data in a format which is hopefully easier to use:
df<-structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4"), result = c(NA,
9L, NA, NA, 3L, NA, NA, 5L, NA, NA, 2L)), .Names = c("dyad",
"year", "event", "agreement", "agreement.name", "result"), class = "data.frame", row.names = c(NA,
-11L))
Upvotes: 0
Views: 33
Reputation: 887501
Here is an option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), create another grouping variable ('ind') based on the non-empty elements in 'agreement.name'. Grouped by both 'dyad' and 'ind' columns, we create a new column 'result' using ifelse
to fill the rows that have 'agreement.name' is non-empty with the max
of 'event'
library(data.table)
setDT(df)[, ind:=cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad][,
result:=ifelse(agreement.name!='', max(event, na.rm=TRUE), NA) ,
list(dyad, ind)][, ind:=NULL][]
# dyad year event agreement agreement.name result
# 1: 1 1985 9 NA NA
# 2: 1 1986 4 1 agreement1 9
# 3: 1 1987 NA NA NA
# 4: 1 2001 3 NA NA
# 5: 1 2002 NA 1 agreement2 3
# 6: 2 1999 1 NA NA
# 7: 2 2000 5 NA NA
# 8: 2 2001 NA 1 agreement3 5
# 9: 2 2002 2 NA NA
#10: 2 2003 NA NA NA
#11: 2 2004 NA 1 agreement 4 2
Or instead of ifelse
, we can use numeric index
setDT(df)[, result:=c(NA, max(event, na.rm=TRUE))[(agreement.name!='')+1L] ,
list(ind= cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad)][]
df <- structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4")),
.Names = c("dyad",
"year", "event", "agreement", "agreement.name"), row.names = c(NA,
-11L), class = "data.frame")
Upvotes: 1