Is there a way to select rows in SAS that contain strings based on values from another dataset?

Question

So I have two datasets. Dataset1 has a column "tags" that contains numbers separated by +. It looks like this:

id   complete   tags
1    true       11+27+5672+26+108+14
2    false      27+5672+26+2675+25+16083945+18567+1022+58157+23+16+11
3    true       13+27+5672+2675+26+238895+23+16
4    true       10+27+5672+163333+26+25+43869595+234149+18567+19390192+13675649+23+16
5    true       10+27+5672+26+2675+25+481056+106602+54874998+54875001+110+23+16
.
.

My second dataset has a list of "tags" like so:

id       name           count
+968+    TagName968     39421
+2675+   TagName2675    7450
.
.

What I would like to do is to go through and select all of the rows from Dataset1 where any of the tags in the tags column match any of the id rows in Dataset2. So basically any rows in Dataset1 that contain "+968+" or "+2675+" or any of the other ~3000 ids (so I can't hardcode them all in). So if I ran it based on the example above, rows 2, 3, and 5 would all be selected because they contain "+2675+".

I have tried different versions of DATA steps and PROC SQL, but I'm not sure I'm doing them right. The problem with PROC SQL is that I'm not really looking for a join, I don't think? Or at least it would have to be one-to-many.

I was thinking I might need to separate out the "tags" column so there's one tag per column, but the problem is that each row can have a wildly different number of tags (many of them have a lot more than given in the example) so I'm not sure that would be the best choice.

Is there a way to select rows in SAS that contain strings based on values from another dataset?

Answers (1)

Related Questions