anrpet
anrpet

Reputation: 139

How to group and extract rows with condition

I have a data frame (df) that looks like this:

 X1        Category  total.count 
100279         A1        1 
100279         A2        1
100279         A3        1
100279         A4        1
100280         A1        1
100280         A2        4
100281         A1        1
100281         A2        1
100282         A1        7
100283         A2        1
100283         A3        1
100283         A4        1

I want to extract a list of the id numbers who are assigned soley 1s in total.count.

 X1
 100279
 100281
 100283

I have tried:

df2 = df[total.count == 1]

but it just returns the same thing with only the rows where total.count is equivalent to 1 :

  X1      total.count 
100279    1 
100279    1
100279    1
100279    1
100280    1
100281    1
100281    1
100283    1
100283    1
100283    1

Any ideas?

Upvotes: 1

Views: 50

Answers (4)

thelatemail
thelatemail

Reputation: 93938

A base R approach using ave, to find groups where all values ==1:

unique(dat[ave(dat$total.count==1, dat$X1, FUN=all),"X1"])
#[1] 100279 100281 100283

Upvotes: 1

Damian
Damian

Reputation: 1433

Using Dplyr

A readable option, if you can use packages

library(dplyr)

df %>%
    group_by(X1) %>%
    summarize(wanted = all(total.count == 1)) %>%
    filter(wanted) %>%
    select(X1) %>%
    c()

$X1
[1] 100279 100281 100283

Or, Base R

If you prefer to use base R, here is one possibility:

unwanted <- as.integer(gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)))

unwanted
[1] 100280 100282

# Wanted IDs
unique( df$X1[! df$X1 %in% unwanted] )
[1] 100279 100281 100283

Unpacking that one liner into logical steps:

# Condition for rows with the correct number
df$total.count == 1
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

# Combinations of ID + condition for each row
unique(paste(df$X1, df$total.count == 1, sep = ","))
[1] "100279,TRUE"  "100280,TRUE"  "100280,FALSE" "100281,TRUE"  "100282,FALSE" "100283,TRUE" 

# Failing combinations
grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)
[1] "100280,FALSE" "100282,FALSE"

# ID numbers associated with failing combinations
gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE))
[1] "100280" "100282"

Upvotes: 0

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

For a data.table approach, you could do:

library(data.table)

setDT(df)[, which(all(total.count==1)), by=X1]

       X1 V1
1: 100279  1
2: 100281  1
3: 100283  1

Also, if df$total.count doesn't have 0s, you could also use:

setDT(df)[, which(sum(total.count)==length(total.count)), by=X1]

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76661

Try the following.

sp <- split(df, df$X1)
inx <- unlist(lapply(sp, function(x) !duplicated(x$total.count == 1) & all(x$total.count == 1)))
df[inx, ]
       X1 Category total.count
1  100279       A1           1
7  100281       A1           1
10 100283       A2           1

Upvotes: 1

Related Questions