Reputation: 2267
I have a dataframe like this
ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12",
"ABC_13","AZM_12","ABC_14","ABC_15",
"CZX_10","CZX_11","CZX_12","CZX_13",
"FIN_10","FIN_11","FIN_12","FIN_13",
"FNM_10","FNM_11","FXS_10","FXS_11")
Id.n <- c(345,380,339,361,
245,390,639,661,
545,580,539,261,
345,180,139,261,
1045,1580,39,161)
df <- data.frame(ID,Id.n)
I am trying to subset this dataframe with the following condition
Threshold of ID.n's for FXS - 100
Threshold of ID.n's for FIN - 200
Threshold of ID.n's for all Other ID's - 300
My desired output is
ID Id.n
ABC_10 345
AZM_11 380
ABC_11 339
ABC_12 361
AZM_12 390
ABC_14 639
ABC_15 661
CZX_10 545
CZX_11 580
CZX_12 539
FIN_10 345
FIN_13 261
FNM_10 1045
FNM_11 1580
FXS_11 161
I am trying to do it this way but just not getting it right.
df <- subset(df,ifelse(grepl("FXS",df$ID), df$ID.n > 100,))
Can someone point me in the right direction?
Upvotes: 2
Views: 75
Reputation: 32548
df[(grepl("FXS",df$ID) & df$Id.n >= 100) |
(grepl("FIN",df$ID) & df$Id.n >= 200) |
(!(grepl("FXS",df$ID) | grepl("FIN", df$ID)) & df$Id.n >= 300),]
# ID Id.n
#1 ABC_10 345
#2 AZM_11 380
#3 ABC_11 339
#4 ABC_12 361
#6 AZM_12 390
#7 ABC_14 639
#8 ABC_15 661
#9 CZX_10 545
#10 CZX_11 580
#11 CZX_12 539
#13 FIN_10 345
#16 FIN_13 261
#17 FNM_10 1045
#18 FNM_11 1580
#20 FXS_11 161
Upvotes: 1
Reputation: 66819
This is simpler with sanitized data. Using data.table, that'd look like...
library(data.table)
setDT(df)
df[, c("x", "y") := tstrsplit(ID, "_")][, ID := NULL ]
xDT = data.table(x = unique(df$x))
xDT[, th := 300 ]
xDT[.(x = c("FXS", "FIN"), th = c(100, 200)), on=.(x), th := i.th ]
Then a non equi join works to filter:
df[xDT, on=.(x, Id.n > th)]
Id.n x y
1: 300 ABC 11
2: 300 ABC 10
3: 300 ABC 12
4: 300 ABC 14
5: 300 ABC 15
6: 300 AZM 11
7: 300 AZM 12
8: 300 CZX 12
9: 300 CZX 10
10: 300 CZX 11
11: 200 FIN 13
12: 200 FIN 10
13: 300 FNM 10
14: 300 FNM 11
15: 100 FXS 11
Regarding grepl
here, I think it is a bad idea.
Upvotes: 4
Reputation: 1359
Using dplyr
:
library(dplyr)
df2 <- df %>%
filter((grepl("FXS", ID) & Id.n > 100) |
(grepl("FIN", ID) & Id.n > 200) |
(!grepl("FXS|FIN", ID) & Id.n > 300))
df2
# ID Id.n
# ABC_10 345
# AZM_11 380
# ABC_11 339
# ABC_12 361
# AZM_12 390
# ABC_14 639
# ABC_15 661
# CZX_10 545
# CZX_11 580
# CZX_12 539
# FIN_10 345
# FIN_13 261
# FNM_10 1045
# FNM_11 1580
# FXS_11 161
Upvotes: 4