Mixcels
Mixcels

Reputation: 899

Using a loop to check if certain values exist within multiple columns

My dataset contains multiple columns, among them are 6 called ID1, ID2, ID3, ID13, ID23, ID33 containing a multitude of ID's. My objective is to check which rows contain certain values (values STARTING with 531 OR 532 OR 533 OR 534 in this case) and subset those into a new dataset.

This code subsets the data for all rows where the ID in any of these columns is exactly 531, but I am at a loss as to how to make it so it checks just values STARTING with 531, and also how to make it check for all 4 starting values: 531, 532, 533 and 534 (without using a ton more conditional statements)

ID <- data[which(data$ID1 == '531' | data$ID2 == '531' | data$ID3 == '531' | data$ID13 == '531' | data$ID23 == '531' | data$ID33 == '531'), ]

I am basically looking for a simpler way to do this.

Here is sample data and my desired output:

Sample Data                 
ID1     ID2     ID3     ID13    ID23    ID33
43645   38894   37338   45617   54419   42716
53123   24277   5323    29160       
22821   26091   46100   65324   22179   12367
        47946           59421   
                1954            9544
63534           53244           4771
59080   32532   41460   34562           15722
30542   11477   13211   11739           26753
29331           5311    7079    534424  5322

Desired Output                  
ID1     ID2     ID3     ID13    ID23    ID33
53123   24277   5323    29160       
63534           53244           4771
29331           5311    7079    534424  5322

I apologize for the formatting (some data points are indeed missing)

Thanks

Upvotes: 0

Views: 983

Answers (1)

Alex
Alex

Reputation: 19803

While your question is a bit unclear (I'm not sure what you want to do with these rows after this.. you can select rows where all id columns have a TRUE using all or if at least one using any):

require(data.table)
set.seed(123)
dt = data.table(id1 = sample(525:535), id2=sample(525:535), id3=sample(525:535))
dt2=dt[,lapply(.SD, FUN = function(z) {z >= 531})]
#      id1   id2   id3
# 1: FALSE FALSE  TRUE
# 2:  TRUE  TRUE  TRUE
# 3:  TRUE FALSE FALSE
# 4:  TRUE FALSE  TRUE
# 5:  TRUE  TRUE FALSE
# 6: FALSE FALSE  TRUE
# 7: FALSE  TRUE FALSE
# 8:  TRUE  TRUE FALSE
# 9: FALSE FALSE FALSE
#10: FALSE FALSE FALSE
#11: FALSE  TRUE  TRUE

If you want to keep rows that have at least 1 TRUE (that would seem to be the case since in your snippet you do |), then you can simply do.

dt[dt2[,rowSums(.SD)>=1]]
#   id1 id2 id3
#1: 528 529 532
#2: 532 531 534
#3: 535 530 530
#4: 534 525 533
#5: 531 534 528
#6: 525 526 531
#7: 527 532 526
#8: 533 533 525
#9: 529 535 535

Upvotes: 2

Related Questions