iheartr
iheartr

Reputation: 13

R: dropping all observations within a group

I have a set of 500 firms with investment data over six years. I want to create a balanced panel of ONLY the firms that have data for all six years, but I can't seem to find an easy way to do this in R.

My data looks like:

IMAGE

I want to drop a firm for all years if it has missing values for just one year. I've tried different conditional subsetting commands but nothing is working for me. Ideas?

Upvotes: 0

Views: 71

Answers (1)

r2evans
r2evans

Reputation: 160417

Three methods:

dat <- data.frame(
  firm = c(1, 1, 1, 2,  2,  2),
  year = c(1, 2, 3, 1,  2,  3),
  val  = c(1, 1, 1, 2, NA, NA)
)

Base R

do.call(
  rbind.data.frame,
  by(dat, dat$firm, function(x) if (!any(is.na(x$val))) x)
)
#     year firm val
# 1.1    1    1   1
# 1.2    2    1   1
# 1.3    3    1   1

or

datrm <- dat[!complete.cases(dat),]
dat[ !dat$firm %in% datrm$firm, ]

data.table

library(data.table)
datDT <- as.data.table(dat)
datDT[, .SD[!any(is.na(val)),], by = "firm" ]

dplyr

library(dplyr)
dat %>%
  group_by(firm) %>%
  filter(!any(is.na(val))) %>%
  ungroup()

Upvotes: 3

Related Questions