Lingyu Kong
Lingyu Kong

Reputation: 85

How to find common elements across multiple years? (select same entities across a time period)

I have a DF like following across time period from 1996 to 2016 with different firms:

year firms
----------
1996  a
1996  b
1996  c
.......
2016  c

My question is how can I select the firms that across the whole time period from 1996 to 2016? In other words, I would like to setup a balanced panel from an unbalanced panel?

The only way I can do so far is like:

Reduce(intersect, list(a,b,c))

if I extract the firms into multiple vectors according to the years. But it's obviously too fussy.

Upvotes: 0

Views: 44

Answers (2)

Sotos
Sotos

Reputation: 51592

You can use table and match the elements with the same length with the length of unique years, i.e.

table(df$firm)

#a b c 
#5 3 3

table(df$firm) == length(unique(df$year))

#    a     b     c 
# TRUE FALSE FALSE 

t1 <- table(df$firm) == length(unique(df$year))
names(t1)[t1]

#[1] "a"

df[df$firm %in% names(t1)[t1],]

#   year firm
#1  1996    a
#4  1997    a
#7  1998    a
#10 1999    a
#13 2000    a

DATA

dput(df)
structure(list(year = c(1996L, 1996L, 1996L, 1997L, 1997L, 1998L, 
1998L, 1999L, 2000L, 2000L, 2000L), firm = c("a", "b", "c", "a", 
"b", "a", "c", "a", "a", "b", "c")), .Names = c("year", "firm"
), row.names = c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 10L, 13L, 14L, 15L
), class = "data.frame")

Upvotes: 0

amatsuo_net
amatsuo_net

Reputation: 2448

The following code first find the name of firms with data entries in all years then subset the data

library(data.table)
#generate sample data
set.seed(1)
dt <- data.table(year = sample(1996:2016, 500, TRUE), 
                 firms = sample(letters[1:10], 500, TRUE))
dt <- dt[!duplicated(dt)][order(year, firms)]
print(dt)

# find the common element
common_element <- dt[, length(unique(year)) == length(1996:2016), by = firms][V1 == TRUE, firms]
print(common_element)

## [1] "a" "j"

# subset the data
dt_subset <- dt[firms %in% common_element]

Upvotes: 0

Related Questions