nsardar
nsardar

Reputation: 1

Selecting only particular rows in R(dataframes)

I have a csv file with 3 columns year,netInc,Organization. I want to clean the data and consider only the Organization whose values are present in all the financial years . For instance I want to consider all years between 2000 and 2003 (count = 4) and don't want data of B and D as it has data for few years only . So how should I do this to get new dataframe having data for 2000 to 2003 and having 50k observations?

Year netInc Org
2000 10     A
2001 20     A
2002 30     A
2003 40     A
2000  5     B
2001  7     B
2000 10     C
2001 20     C
2002 30     C
2003 40     C
2010 10     D

Upvotes: 0

Views: 96

Answers (1)

akrun
akrun

Reputation: 887851

We can use a group by filter i.e. grouped by 'Org', filter only those 'Org' having all the 'Year' from 2000 to 2003

library(dplyr)
df1 %>%
    group_by(Org) %>% 
    filter(all(2000:2003 %in% Year)) %>% 
    ungroup

-output

# A tibble: 8 x 3
   Year netInc Org  
  <int>  <int> <chr>
1  2000     10 A    
2  2001     20 A    
3  2002     30 A    
4  2003     40 A    
5  2000     10 C    
6  2001     20 C    
7  2002     30 C    
8  2003     40 C    

data

df1 <- structure(list(Year = c(2000L, 2001L, 2002L, 2003L, 2000L, 2001L, 
2000L, 2001L, 2002L, 2003L, 2010L), netInc = c(10L, 20L, 30L, 
40L, 5L, 7L, 10L, 20L, 30L, 40L, 10L), Org = c("A", "A", "A", 
"A", "B", "B", "C", "C", "C", "C", "D")), 
class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 3

Related Questions