Reputation: 899
My dataset contains multiple columns, among them are 6 called ID1, ID2, ID3, ID13, ID23, ID33 containing a multitude of ID's. My objective is to check which rows contain certain values (values STARTING with 531 OR 532 OR 533 OR 534 in this case) and subset those into a new dataset.
This code subsets the data for all rows where the ID in any of these columns is exactly 531, but I am at a loss as to how to make it so it checks just values STARTING with 531, and also how to make it check for all 4 starting values: 531, 532, 533 and 534 (without using a ton more conditional statements)
ID <- data[which(data$ID1 == '531' | data$ID2 == '531' | data$ID3 == '531' | data$ID13 == '531' | data$ID23 == '531' | data$ID33 == '531'), ]
I am basically looking for a simpler way to do this.
Here is sample data and my desired output:
Sample Data
ID1 ID2 ID3 ID13 ID23 ID33
43645 38894 37338 45617 54419 42716
53123 24277 5323 29160
22821 26091 46100 65324 22179 12367
47946 59421
1954 9544
63534 53244 4771
59080 32532 41460 34562 15722
30542 11477 13211 11739 26753
29331 5311 7079 534424 5322
Desired Output
ID1 ID2 ID3 ID13 ID23 ID33
53123 24277 5323 29160
63534 53244 4771
29331 5311 7079 534424 5322
I apologize for the formatting (some data points are indeed missing)
Thanks
Upvotes: 0
Views: 983
Reputation: 19803
While your question is a bit unclear (I'm not sure what you want to do with these rows after this.. you can select rows where all id columns have a TRUE
using all
or if at least one using any
):
require(data.table)
set.seed(123)
dt = data.table(id1 = sample(525:535), id2=sample(525:535), id3=sample(525:535))
dt2=dt[,lapply(.SD, FUN = function(z) {z >= 531})]
# id1 id2 id3
# 1: FALSE FALSE TRUE
# 2: TRUE TRUE TRUE
# 3: TRUE FALSE FALSE
# 4: TRUE FALSE TRUE
# 5: TRUE TRUE FALSE
# 6: FALSE FALSE TRUE
# 7: FALSE TRUE FALSE
# 8: TRUE TRUE FALSE
# 9: FALSE FALSE FALSE
#10: FALSE FALSE FALSE
#11: FALSE TRUE TRUE
If you want to keep rows that have at least 1 TRUE
(that would seem to be the case since in your snippet you do |
), then you can simply do.
dt[dt2[,rowSums(.SD)>=1]]
# id1 id2 id3
#1: 528 529 532
#2: 532 531 534
#3: 535 530 530
#4: 534 525 533
#5: 531 534 528
#6: 525 526 531
#7: 527 532 526
#8: 533 533 525
#9: 529 535 535
Upvotes: 2