Reputation: 471
I ran into a problem with the scan function in sas.
The dataset I have contains one variable that needs to be split into multiple variables.
The variable is structured like this:
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
I use this code to split this into multiple variables:
data /*ULB.*/work.smart_BCSS_withNISS_&JJ.&K.;
set work.smart_BCSS_withNISS_&JJ.&K.;
/* Maand splitsen in variablen */
mois=scan(smart,1,"__");
jours=scan(smart,2,"__");
nbjours=scan(smart,3,"__");
refClient=scan(smart,4,"__");
paysPrestation=scan(smart,5,"__");
wordingFR=scan(smart,6,"__");
wordingNL=scan(smart,7,"__");
fonction=scan(smart,8,"__");
ARTISTIQUE2=scan(smart,9,"__");
Art_At_LEAST=scan(smart,10,"__");
totalBrut=scan(smart,11,"__");
totalImposable=scan(smart,12,"__");
run;
Most of the time this works perfectly. However sometimes the 4th variable 'refClient' contains one single underscore like this:
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
Somehow the scan function also detects this single underscore as a separator even though the separator is a double underscore.
Any idea on how to avoid this behavior?
Upvotes: 2
Views: 1867
Reputation: 9109
Mildly interesting, the INFILE statement supports a delimiter string.
data test;
infile cards dlmstr='__';
input (mois
jours
nbjours
refClient
paysPrestation
wordingFR
wordingNL
fonction
ARTISTIQUE2
Art_At_LEAST
totalBrut
totalImposable) (:$32.);
cards;
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
;;;;
run;
proc print;
run;
Upvotes: 1
Reputation: 9569
Aurieli's code works, but their answer doesn't explain why. Your understanding of how scan
works is incorrect.
scan
, each character is treated as a delimiter. You've specified _
twice. If you had specified ab
then a
and b
would both have been treated as delimiters, rather than ab
being the delimiter.scan
by default treats multiple consecutive delimiters as a single delimiter, which was why your code treated both __
and _
as delimiters. So if you specified ab
as the delimiter string then ba
, abba
etc. would also be counted as a single delimiter by default.Upvotes: 5
Reputation: 1770
You can use regexp to change single '_' (for example, change to '-') and then scan what you want:
data /*ULB.*/work.test;
smart="4__J04__1__LE_18__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7";
smartcr=prxchange("s/(?<=[^_])(_{1})(?=[^_])/-/",-1,smart);
/* Maand splitsen in variablen */
mois=scan(smartcr,1,"__");
jours=scan(smartcr,2,"__");
nbjours=scan(smartcr,3,"__");
refClient=tranwrd(scan(smartcr,4,"__"),'-','_');
paysPrestation=scan(smartcr,5,"__");
wordingFR=scan(smartcr,6,"__");
wordingNL=scan(smartcr,7,"__");
fonction=scan(smartcr,8,"__");
ARTISTIQUE2=scan(smartcr,9,"__");
Art_At_LEAST=scan(smartcr,10,"__");
totalBrut=scan(smartcr,11,"__");
totalImposable=scan(smartcr,12,"__");
run;
Upvotes: 1