Reputation: 85
I have a DF like following across time period from 1996 to 2016 with different firms:
year firms
----------
1996 a
1996 b
1996 c
.......
2016 c
My question is how can I select the firms that across the whole time period from 1996 to 2016? In other words, I would like to setup a balanced panel from an unbalanced panel?
The only way I can do so far is like:
Reduce(intersect, list(a,b,c))
if I extract the firms into multiple vectors according to the years. But it's obviously too fussy.
Upvotes: 0
Views: 44
Reputation: 51592
You can use table and match the elements with the same length with the length of unique years, i.e.
table(df$firm)
#a b c
#5 3 3
table(df$firm) == length(unique(df$year))
# a b c
# TRUE FALSE FALSE
t1 <- table(df$firm) == length(unique(df$year))
names(t1)[t1]
#[1] "a"
df[df$firm %in% names(t1)[t1],]
# year firm
#1 1996 a
#4 1997 a
#7 1998 a
#10 1999 a
#13 2000 a
DATA
dput(df)
structure(list(year = c(1996L, 1996L, 1996L, 1997L, 1997L, 1998L,
1998L, 1999L, 2000L, 2000L, 2000L), firm = c("a", "b", "c", "a",
"b", "a", "c", "a", "a", "b", "c")), .Names = c("year", "firm"
), row.names = c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 10L, 13L, 14L, 15L
), class = "data.frame")
Upvotes: 0
Reputation: 2448
The following code first find the name of firms with data entries in all years then subset the data
library(data.table)
#generate sample data
set.seed(1)
dt <- data.table(year = sample(1996:2016, 500, TRUE),
firms = sample(letters[1:10], 500, TRUE))
dt <- dt[!duplicated(dt)][order(year, firms)]
print(dt)
# find the common element
common_element <- dt[, length(unique(year)) == length(1996:2016), by = firms][V1 == TRUE, firms]
print(common_element)
## [1] "a" "j"
# subset the data
dt_subset <- dt[firms %in% common_element]
Upvotes: 0