Novic
Novic

Reputation: 351

How drop some observations with less a number of values in R?

I want to create a simple example. Maybe so simple but I have no idea how to write the code for it.

There is a panel dataset with two variables date and company and some other varables in front of them:

date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a","b","c","a","b","c","d","e")

Not every company has been traded every day. So I want just to keep the data related to the companies that have been traded for example more than 4 times. In this example, I have 6 days and 5 companies. Company "e" and "d" should become the ones to be deleted.

Upvotes: 1

Views: 628

Answers (1)

MKR
MKR

Reputation: 20085

One option is to use dplyr::filter with group_by. The n() provides count of rows for a group_by item. Hence, n() will return number of times a company traded after applying group_by on company.

#data
date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a",
           "b","c","a","b","c","d","e")
df <- data.frame(date, company)

library(dplyr)

df %>% group_by(company) %>%
  filter(n() > 4)            #subset companies traded for more than 4 times

#Result: e & d not appearing as for them count (n()) was less than 4
# # A tibble: 17 x 2
# # Groups: company [3]
# date company
# <dbl> <fctr> 
#   1  1.00 a      
# 2  1.00 b      
# 3  1.00 c      
# 4  2.00 a      
# 5  2.00 b      
# 6  2.00 c      
# 7  3.00 a      
# 8  3.00 b      
# 9  4.00 a      
# 10  4.00 b      
# 11  4.00 c      
# 12  5.00 a      
# 13  5.00 b      
# 14  5.00 c      
# 15  6.00 a      
# 16  6.00 b      
# 17  6.00 c

Upvotes: 2

Related Questions