Dummy variable for each first observation of a categorical variable (id) in r

Question

Question : I want to create a dummy variable first in R which is 1 if the value of a another dummy changed from 0 to 1 under the condition that it is not the first observation for an id number. The problem behind this is that I want to recognise firms which entered a market during the observed time period in a panel setting.

As an example I tried to create this with a small sample set:

id <- c(1,1,1,2,2,3,3,3) 
dummy <- c(0,1,1,0,1,1,0,1)

df <- data.frame(id,dummy)
df[,"id"]


first.dum <- function(x)  
  c( x[-1,"id"] == x[,"id"]
    & x[-1,"dummy"] != x[,"dummy"]
     & x[,"dummy"] == "1")

df$first <- first.dum(df)
df

The result comes like ...

 id dummy first
1  1     0 FALSE
2  1     1 FALSE
3  1     1 FALSE
4  2     0 FALSE
5  2     1 FALSE
6  3     1  TRUE
7  3     0 FALSE
8  3     1 FALSE

I think I did not understand how that dataframe manipulation really works.

Any help would be appreciated.

David Arenburg · Accepted Answer

Here's how I would approach this using data.table package

library(data.table)
setDT(df)[, first := c(0, diff(dummy)) == 1, id][]
#    id dummy first
# 1:  1     0 FALSE
# 2:  1     1  TRUE
# 3:  1     1 FALSE
# 4:  2     0 FALSE
# 5:  2     1  TRUE
# 6:  3     1 FALSE
# 7:  3     0 FALSE
# 8:  3     1  TRUE

Basically we are checking per group, if dummy is bigger by one than the previous observation (starting from the second observation).

You can do it similarly using dplyr

library(dplyr)
df %>% group_by(id) %>% mutate(first = c(0, diff(dummy)) == 1)

Or using base R

unlist(tapply(df$dummy, df$id, function(x)  c(0, diff(x)) == 1))

Dummy variable for each first observation of a categorical variable (id) in r

Answers (2)

Related Questions