Reputation: 139
I have a data frame (df) that looks like this:
X1 Category total.count
100279 A1 1
100279 A2 1
100279 A3 1
100279 A4 1
100280 A1 1
100280 A2 4
100281 A1 1
100281 A2 1
100282 A1 7
100283 A2 1
100283 A3 1
100283 A4 1
I want to extract a list of the id numbers who are assigned soley 1s in total.count.
X1
100279
100281
100283
I have tried:
df2 = df[total.count == 1]
but it just returns the same thing with only the rows where total.count is equivalent to 1 :
X1 total.count
100279 1
100279 1
100279 1
100279 1
100280 1
100281 1
100281 1
100283 1
100283 1
100283 1
Any ideas?
Upvotes: 1
Views: 50
Reputation: 93938
A base R approach using ave
, to find groups where all
values ==1
:
unique(dat[ave(dat$total.count==1, dat$X1, FUN=all),"X1"])
#[1] 100279 100281 100283
Upvotes: 1
Reputation: 1433
A readable option, if you can use packages
library(dplyr)
df %>%
group_by(X1) %>%
summarize(wanted = all(total.count == 1)) %>%
filter(wanted) %>%
select(X1) %>%
c()
$X1
[1] 100279 100281 100283
If you prefer to use base R, here is one possibility:
unwanted <- as.integer(gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)))
unwanted
[1] 100280 100282
# Wanted IDs
unique( df$X1[! df$X1 %in% unwanted] )
[1] 100279 100281 100283
Unpacking that one liner into logical steps:
# Condition for rows with the correct number
df$total.count == 1
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
# Combinations of ID + condition for each row
unique(paste(df$X1, df$total.count == 1, sep = ","))
[1] "100279,TRUE" "100280,TRUE" "100280,FALSE" "100281,TRUE" "100282,FALSE" "100283,TRUE"
# Failing combinations
grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)
[1] "100280,FALSE" "100282,FALSE"
# ID numbers associated with failing combinations
gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE))
[1] "100280" "100282"
Upvotes: 0
Reputation: 1709
For a data.table
approach, you could do:
library(data.table)
setDT(df)[, which(all(total.count==1)), by=X1]
X1 V1
1: 100279 1
2: 100281 1
3: 100283 1
Also, if df$total.count
doesn't have 0s, you could also use:
setDT(df)[, which(sum(total.count)==length(total.count)), by=X1]
Upvotes: 1
Reputation: 76661
Try the following.
sp <- split(df, df$X1)
inx <- unlist(lapply(sp, function(x) !duplicated(x$total.count == 1) & all(x$total.count == 1)))
df[inx, ]
X1 Category total.count
1 100279 A1 1
7 100281 A1 1
10 100283 A2 1
Upvotes: 1