Reputation: 262
I have a dataframe with over 4000 columns and 3790 rows. Column represent companies and row present daily observation data for them. 3790 rows imply 15 years of daily observations. Now I want to calculate that each column has equal to or more than 100 daily observation plus positive it to be positive value over the 15 year sample i.e. from Jan-2000 to Dec-2014. In short, I want to filter out companies out from my sample that have less than 100 positive observations out from 3790. The structure of my data is such it has missing values because companies listed at various point of times. For instance, company listed in year 2003,hence, all NAs before 2003. I illustrate the structure of my dataframe as follows:
Date A B C
30/12/1999 79.5 325 NA
04/01/2000 79.5 325 NA
05/01/2000 79.5 322.5 NA
06/01/2000 79.5 327.5 NA
07/01/2000 79.5 327.5 NA
10/01/2000 79.5 327.5 NA
11/01/2000 79.5 327.5 NA
12/01/2000 79.5 331.5 NA
13/01/2000 79.5 334 NA
14/01/2000 79.5 334 NA
17/01/2000 94.5 350 NA
18/01/2000 95.5 351.5 NA
19/01/2000 94.5 352.5 NA
20/01/2000 97.5 352.5 NA
21/01/2000 97.5 352.5 NA
24/01/2000 97.5 352.5 NA
25/01/2000 97.5 352.5 NA
I would appreciate your help in this regard.
Upvotes: 0
Views: 204
Reputation: 887108
We can use Filter
Filter(function(x) sum(x>0 & !is.na(x)) > 100, df1)
Upvotes: 1