zoowalk
zoowalk

Reputation: 2134

obtain nested values from dateframe

I am trying to get the maximum value in the column event until an agreement (dummy) is reached; Events are nested in agreements, agreements are nested in dyad which run over year. Note that years are not always continuous, meaning there are breaks between the years (1986, 1987,2001,2002).

I am able to get the maximum values within the dyad with a ddply and max(event); but I struggle how to ‘assign’ the different events to the right agreement (until/after). I am basically lacking an 'identifier' which assigns each observation to an agreement.

The results which I am looking for are already in the column "result".

dyad    year    event   agreement   agreement.name  result  
  1     1985    9           
  1     1986    4       1           agreement1       9 
  1     1987    
  1     2001    3       
  1     2002            1           agreement2       3
  2     1999    1       
  2     2000    5            
  2     2001            1           agreement3       5 
  2     2002    2       
  2     2003                
  2     2004    1                   agreement 4      2

Here is the data in a format which is hopefully easier to use:

df<-structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L, 
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L, 
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA, 
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "", 
"agreement2", "", "", "agreement3", "", "", "agreement 4"), result = c(NA, 
9L, NA, NA, 3L, NA, NA, 5L, NA, NA, 2L)), .Names = c("dyad", 
"year", "event", "agreement", "agreement.name", "result"), class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 0

Views: 33

Answers (1)

akrun
akrun

Reputation: 887501

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), create another grouping variable ('ind') based on the non-empty elements in 'agreement.name'. Grouped by both 'dyad' and 'ind' columns, we create a new column 'result' using ifelse to fill the rows that have 'agreement.name' is non-empty with the max of 'event'

library(data.table)
setDT(df)[, ind:=cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad][,
    result:=ifelse(agreement.name!='', max(event, na.rm=TRUE), NA) ,
                list(dyad, ind)][, ind:=NULL][]
#       dyad year event agreement agreement.name result
# 1:    1 1985     9        NA                    NA
# 2:    1 1986     4         1     agreement1      9
# 3:    1 1987    NA        NA                    NA
# 4:    1 2001     3        NA                    NA
# 5:    1 2002    NA         1     agreement2      3
# 6:    2 1999     1        NA                    NA
# 7:    2 2000     5        NA                    NA
# 8:    2 2001    NA         1     agreement3      5
# 9:    2 2002     2        NA                    NA
#10:    2 2003    NA        NA                    NA
#11:    2 2004    NA         1    agreement 4      2

Or instead of ifelse, we can use numeric index

setDT(df)[, result:=c(NA, max(event, na.rm=TRUE))[(agreement.name!='')+1L] ,
   list(ind= cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad)][]

data

df <- structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L, 
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L, 
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA, 
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "", 
"agreement2", "", "", "agreement3", "", "", "agreement 4")), 
.Names = c("dyad", 
"year", "event", "agreement", "agreement.name"), row.names = c(NA,
-11L), class = "data.frame")

Upvotes: 1

Related Questions