Reputation: 724
I have a data frame with 15 columns and 75,000 rows. There are approximately 25,000 unique values for column 1.
I want a new data frame copied from this where the value in column 1 appears 365 times and the value in columns 10 and 11 are not equal to -999.
Smaller example:
df <- data.frame(a=c(1,1,1,2,2,2,3,3,4), b=c(4,5,6,4,5,6,4,5,6), c=c(7,8,9,-999,8,9,7,8,9),d=c(2,3,4,2,3,-999,2,3,4))
a b c d
1 1 4 7 2
2 1 5 8 3
3 1 6 9 4
4 2 4 -999 2
5 2 5 8 3
6 2 6 9 -999
7 3 4 7 2
8 3 5 8 3
9 4 6 9 4
For example I want the values that show up in column a 3 times (1,2,3) AND don't have -999 in columns c or d.
a b c d
1 1 4 7 2
2 1 5 8 3
3 1 6 9 4
I've tried using something like tmp <- table(df[df$c != -999,"a"])
but that only gives me the frequency. How can I copy the data out?
Upvotes: 2
Views: 2088
Reputation: 93938
This might be a good case for the application of data.table
to express the selection arguments clearly:
library(data.table)
df <- data.table(df)
df[,if(.N>=3 & !any(c==-999)) .SD, by=a]
Result:
a b c d
1: 1 4 7 2
2: 1 5 8 3
3: 1 6 9 4
Upvotes: 2
Reputation: 15425
You're halfway there. First, subset out those names which have frequencies greater than 3 and then find all elements of df$a
in those names:
tmp3 <- names(tmp)[tmp >= 3]
df[df$a %in% tmp3,]
a b c d
1 1 4 7 2
2 1 5 8 3
3 1 6 9 4
Upvotes: 2