Aman Chawla
Aman Chawla

Reputation: 724

Copy rows where data equals a certain value and repeats n number of times

I have a data frame with 15 columns and 75,000 rows. There are approximately 25,000 unique values for column 1.

I want a new data frame copied from this where the value in column 1 appears 365 times and the value in columns 10 and 11 are not equal to -999.

Smaller example:

df <- data.frame(a=c(1,1,1,2,2,2,3,3,4), b=c(4,5,6,4,5,6,4,5,6), c=c(7,8,9,-999,8,9,7,8,9),d=c(2,3,4,2,3,-999,2,3,4))
  a b    c    d
1 1 4    7    2
2 1 5    8    3
3 1 6    9    4
4 2 4 -999    2
5 2 5    8    3
6 2 6    9 -999
7 3 4    7    2
8 3 5    8    3
9 4 6    9    4

For example I want the values that show up in column a 3 times (1,2,3) AND don't have -999 in columns c or d.

  a b    c    d
1 1 4    7    2
2 1 5    8    3
3 1 6    9    4

I've tried using something like tmp <- table(df[df$c != -999,"a"]) but that only gives me the frequency. How can I copy the data out?

Upvotes: 2

Views: 2088

Answers (2)

thelatemail
thelatemail

Reputation: 93938

This might be a good case for the application of data.table to express the selection arguments clearly:

library(data.table)
df <- data.table(df)
df[,if(.N>=3 & !any(c==-999)) .SD, by=a]

Result:

   a b c d
1: 1 4 7 2
2: 1 5 8 3
3: 1 6 9 4

Upvotes: 2

sebastian-c
sebastian-c

Reputation: 15425

You're halfway there. First, subset out those names which have frequencies greater than 3 and then find all elements of df$a in those names:

tmp3 <- names(tmp)[tmp >= 3]
df[df$a %in% tmp3,]

  a b c d
1 1 4 7 2
2 1 5 8 3
3 1 6 9 4

Upvotes: 2

Related Questions