utser
utser

Reputation: 23

Dummy variable for each first observation of a categorical variable (id) in r

Question : I want to create a dummy variable first in R which is 1 if the value of a another dummy changed from 0 to 1 under the condition that it is not the first observation for an id number. The problem behind this is that I want to recognise firms which entered a market during the observed time period in a panel setting.

As an example I tried to create this with a small sample set:

id <- c(1,1,1,2,2,3,3,3) 
dummy <- c(0,1,1,0,1,1,0,1)

df <- data.frame(id,dummy)
df[,"id"]


first.dum <- function(x)  
  c( x[-1,"id"] == x[,"id"]
    & x[-1,"dummy"] != x[,"dummy"]
     & x[,"dummy"] == "1")

df$first <- first.dum(df)
df 

The result comes like ...

 id dummy first
1  1     0 FALSE
2  1     1 FALSE
3  1     1 FALSE
4  2     0 FALSE
5  2     1 FALSE
6  3     1  TRUE
7  3     0 FALSE
8  3     1 FALSE

I think I did not understand how that dataframe manipulation really works.

Any help would be appreciated.

Upvotes: 2

Views: 1074

Answers (2)

David Arenburg
David Arenburg

Reputation: 92302

Here's how I would approach this using data.table package

library(data.table)
setDT(df)[, first := c(0, diff(dummy)) == 1, id][]
#    id dummy first
# 1:  1     0 FALSE
# 2:  1     1  TRUE
# 3:  1     1 FALSE
# 4:  2     0 FALSE
# 5:  2     1  TRUE
# 6:  3     1 FALSE
# 7:  3     0 FALSE
# 8:  3     1  TRUE

Basically we are checking per group, if dummy is bigger by one than the previous observation (starting from the second observation).

You can do it similarly using dplyr

library(dplyr)
df %>% group_by(id) %>% mutate(first = c(0, diff(dummy)) == 1)

Or using base R

unlist(tapply(df$dummy, df$id, function(x)  c(0, diff(x)) == 1))

Upvotes: 2

Henry
Henry

Reputation: 6784

Try something like

df$first <- df$id == c(NA, df$id[-nrow(df)]) & 
            df$dummy > c(1, df$dummy[-nrow(df)]) 

to give

> df
  id dummy first
1  1     0 FALSE
2  1     1  TRUE
3  1     1 FALSE
4  2     0 FALSE
5  2     1  TRUE
6  3     1 FALSE
7  3     0 FALSE
8  3     1  TRUE

If you want something like your function, consider

first.dum <- function(x) { 
    y <- rbind(c(NA,1),x[-nrow(x),]) 
    x[,"id"] == y[,"id"] & x[,"dummy"] > y[,"dummy"]
    }

Upvotes: 2

Related Questions