Reputation: 351
I want to create a simple example. Maybe so simple but I have no idea how to write the code for it.
There is a panel dataset with two variables date
and company
and some other varables in front of them:
date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a","b","c","a","b","c","d","e")
Not every company has been traded every day. So I want just to keep the data related to the companies that have been traded for example more than 4 times. In this example, I have 6 days and 5 companies. Company "e" and "d" should become the ones to be deleted.
Upvotes: 1
Views: 628
Reputation: 20085
One option is to use dplyr::filter
with group_by
. The n()
provides count of rows for a group_by
item. Hence, n()
will return number of times a company traded after applying group_by
on company
.
#data
date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a",
"b","c","a","b","c","d","e")
df <- data.frame(date, company)
library(dplyr)
df %>% group_by(company) %>%
filter(n() > 4) #subset companies traded for more than 4 times
#Result: e & d not appearing as for them count (n()) was less than 4
# # A tibble: 17 x 2
# # Groups: company [3]
# date company
# <dbl> <fctr>
# 1 1.00 a
# 2 1.00 b
# 3 1.00 c
# 4 2.00 a
# 5 2.00 b
# 6 2.00 c
# 7 3.00 a
# 8 3.00 b
# 9 4.00 a
# 10 4.00 b
# 11 4.00 c
# 12 5.00 a
# 13 5.00 b
# 14 5.00 c
# 15 6.00 a
# 16 6.00 b
# 17 6.00 c
Upvotes: 2