Creating a new column that has the value 1 conditional on the difference by group

Question

I have a data set as follows:

DT <- structure(list(year = structure(c(1993, 1993, 1993, 1997, 1997, 
1997, 1999, 1999, 1999, 2003, 2003, 2005, 2005, 2005, 2009, 2009, 
2009, 2011, 2011, 2011, 2015, 2015, 2017, 2017, 2017), comment = "year"), 
    State = structure(c("Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas"), class = "AsIs", comment = "state"), 
    State_Abbr = structure(c("KS", "KS", "KS", "KS", "KS", "KS", 
    "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", 
    "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS"), class = "AsIs", comment = "state_po"), 
    Party = structure(c("republican", "democrat", "Other", "republican", 
    "democrat", "Other", "republican", "democrat", "Other", "republican", 
    "Other", "democrat", "republican", "Other", "republican", 
    "democrat", "Other", "democrat", "republican", "Other", "republican", 
    "Other", "democrat", "republican", "Other"), class = "AsIs", comment = "party"), 
    Percentage = structure(c(0.626967802302283, 0.310289787269175, 
    0.0627424104285421, 0.620238525135418, 0.344369476385061, 
    0.035391998479521, 0.652661584410013, 0.315878201849193, 
    0.0314602137407939, 0.825223659651155, 0.174776340348845, 
    0.274872411697912, 0.691627798218281, 0.033499790083807, 
    0.600583964516102, 0.364584658335329, 0.0348313771485682, 
    0.263785496339944, 0.70094378363408, 0.0352707200259761, 
    0.531464769317622, 0.468535230682378, 0.322381278217064, 
    0.621752543886607, 0.0558661778963293), comment = "totalvotes"), 
    cyear = structure(c(1992L, 1992L, 1992L, 1996L, 1996L, 1996L, 
    1998L, 1998L, 1998L, 2002L, 2002L, 2004L, 2004L, 2004L, 2008L, 
    2008L, 2008L, 2010L, 2010L, 2010L, 2014L, 2014L, 2016L, 2016L, 
    2016L), comment = "year")), row.names = c(NA, -25L), class = c("data.table", 
"data.frame"))

A small part of the data is shown below:

I would like to create a new column, which has the row number if the difference between the year by group is larger than 2. That would be row 4, 5 and 6 in the picture.

DT <- setDT(DT)[year-shift(year)>2, newcolumn := 1 , by=c("State", "Party")]

Which I read as: If the year in the row is more than two larger as the year in the previous row (of the same group by=c("State", "Party"), the new column will get the value of 1.

Somehow however, it does not create a one for each party, but only by year, and the output is like this:

DT_new <- structure(list(year = structure(c(1993, 1993, 1993, 1997, 1997, 
1997, 1999, 1999, 1999, 2003, 2003, 2005, 2005, 2005, 2009, 2009, 
2009, 2011, 2011, 2011, 2015, 2015, 2017, 2017, 2017), comment = "year"), 
    State = structure(c("Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", "Kansas", 
    "Kansas", "Kansas", "Kansas"), class = "AsIs", comment = "state"), 
    State_Abbr = structure(c("KS", "KS", "KS", "KS", "KS", "KS", 
    "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", 
    "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS", "KS"), class = "AsIs", comment = "state_po"), 
    Party = structure(c("republican", "democrat", "Other", "republican", 
    "democrat", "Other", "republican", "democrat", "Other", "republican", 
    "Other", "democrat", "republican", "Other", "republican", 
    "democrat", "Other", "democrat", "republican", "Other", "republican", 
    "Other", "democrat", "republican", "Other"), class = "AsIs", comment = "party"), 
    Percentage = structure(c(0.626967802302283, 0.310289787269175, 
    0.0627424104285421, 0.620238525135418, 0.344369476385061, 
    0.035391998479521, 0.652661584410013, 0.315878201849193, 
    0.0314602137407939, 0.825223659651155, 0.174776340348845, 
    0.274872411697912, 0.691627798218281, 0.033499790083807, 
    0.600583964516102, 0.364584658335329, 0.0348313771485682, 
    0.263785496339944, 0.70094378363408, 0.0352707200259761, 
    0.531464769317622, 0.468535230682378, 0.322381278217064, 
    0.621752543886607, 0.0558661778963293), comment = "totalvotes"), 
    cyear = structure(c(1992L, 1992L, 1992L, 1996L, 1996L, 1996L, 
    1998L, 1998L, 1998L, 2002L, 2002L, 2004L, 2004L, 2004L, 2008L, 
    2008L, 2008L, 2010L, 2010L, 2010L, 2014L, 2014L, 2016L, 2016L, 
    2016L), comment = "year"), newcolum = c(NA, NA, NA, 1, NA, 
    NA, NA, NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, 
    1, NA, NA, NA, NA)), row.names = c(NA, -25L), class = c("data.table", 
"data.frame"))

What am I doing wrong?

Ronak Shah · Accepted Answer

You can try to check for condition in j :

library(data.table)

DT[, newcolumn := as.integer(year - shift(year) > 2), .(State, Party)]

Creating a new column that has the value 1 conditional on the difference by group

Answers (2)

Related Questions