Reputation: 5054
Got the following example
I'm trying to know if any part of string in the column nomvar
in table tata does exist in col1 in table toto and if yes, give me the definition using col2.
For I2010,RT,IS-IPI,F_CC11_X_CCXBA, I would have in the column intitule
"yes,toto,tata,well"
I thought about using a proc sql
with an insert and a select but I have two tables and I would need to do a join.
In the same time, I thought to have everything in one table but I'm unsure if it is a good idea.
Any suggestions are welcomed as I'm deeply stuck.
Upvotes: 0
Views: 203
Reputation: 63434
Assuming your data is all structured like this (you're looking at the different strings in between .
characters) I would think the easiest way is to normalize TATA
(splitting by .
) and then doing a straight join, then (if you need to) transposing back. (It might be better to leave it vertical - very likely you would find this more useful structure for analysis.)
data tata_v;
set tata;
call scan(nomvar,1,position,length,'.');
do _i = 1 by 1 while position le 0);
nomvar_out = substr(nomvar,position,length);
output;
call scan(nomvar,_i+1,position,length,'.');
end;
run;
Now you can join on nomvar_out
and then (if needed) recombine things.
Upvotes: 2
Reputation: 334
The SAS data step hash object is a nice way to do this. It allows you to read the Toto table into memory and it becomes a lookup table for you. Then you just walk the string from the Tata table using the scan function, tokenize, and lookup the col2 value. Here is the code.
By the way, turning table Tata into a structure like Toto and performing join is a perfectly rational way to do this, too.
/*Create sample data*/
data toto;
length col1 col2 $ 100;
col1='I2010';
col2='yes';
output;
col1='RT';
col2='toto';
output;
col1='IS-IPI';
col2='tata';
output;
col1='F_CC11_X_CCXBA';
col2='well';
output;
run;
data tata;
length nomvar intitule $ 100;
nomvar='I2010,RT,IS-IPI,F_CC11_X_CCXBA';
run;
/*Now for the solution*/
/*You can do this lookup easily with a data step hash object*/
data tata;
set tata;
length col1 col2 token $ 100;
drop col1 col2 token i sepchar rc;
/*slurp the data in from the Toto data set into the hash*/
if (_n_ = 1) then do;
declare hash toto_hash(dataset: 'work.toto');
rc = toto_hash.definekey('col1');
rc = toto_hash.definedata('col2');
toto_hash.definedone();
end;
/*now walk the tokens in data set tata and perform the lookup to get each value*/
i = 1;
sepchar = ''; /*this will be a comma after the first iteration of the loop*/
intitule = '';
do until (token = '');
/*grab nth item in the comma-separated list*/
token = scan(nomvar, i, ',');
/*lookup the col2 value from the toto data set*/
rc = toto_hash.find(key:token);
if (rc = 0) then do;
/*lookup successful so tack the value on*/
intitule = strip(intitule) || sepchar || col2;
sepchar = ',';
end;
i = i + 1;
end;
run;
Upvotes: 3