GleDel
GleDel

Reputation: 471

SAS Scan function separator not working as it should

I ran into a problem with the scan function in sas.

The dataset I have contains one variable that needs to be split into multiple variables.

The variable is structured like this:

4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7

I use this code to split this into multiple variables:

data /*ULB.*/work.smart_BCSS_withNISS_&JJ.&K.;
                set work.smart_BCSS_withNISS_&JJ.&K.;
                /* Maand splitsen in variablen */
                mois=scan(smart,1,"__");
                jours=scan(smart,2,"__");
                nbjours=scan(smart,3,"__");
                refClient=scan(smart,4,"__");
                paysPrestation=scan(smart,5,"__");
                wordingFR=scan(smart,6,"__");
                wordingNL=scan(smart,7,"__");
                fonction=scan(smart,8,"__");
                ARTISTIQUE2=scan(smart,9,"__");
                Art_At_LEAST=scan(smart,10,"__");
                totalBrut=scan(smart,11,"__");
                totalImposable=scan(smart,12,"__");
run;

Most of the time this works perfectly. However sometimes the 4th variable 'refClient' contains one single underscore like this:

4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7

Somehow the scan function also detects this single underscore as a separator even though the separator is a double underscore.

Any idea on how to avoid this behavior?

Upvotes: 2

Views: 1867

Answers (3)

data _null_
data _null_

Reputation: 9109

Mildly interesting, the INFILE statement supports a delimiter stringenter image description here.

data test;
   infile cards dlmstr='__';
   input (mois
      jours
      nbjours
      refClient
      paysPrestation
      wordingFR
      wordingNL
      fonction
      ARTISTIQUE2
      Art_At_LEAST
      totalBrut
      totalImposable) (:$32.);
   cards;
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
;;;;
   run;
proc print;
   run;

enter image description here

Upvotes: 1

user667489
user667489

Reputation: 9569

Aurieli's code works, but their answer doesn't explain why. Your understanding of how scan works is incorrect.

  1. If there is more than 1 character in the delimiter specified for scan, each character is treated as a delimiter. You've specified _ twice. If you had specified ab then a and b would both have been treated as delimiters, rather than ab being the delimiter.
  2. scan by default treats multiple consecutive delimiters as a single delimiter, which was why your code treated both __ and _ as delimiters. So if you specified ab as the delimiter string then ba, abba etc. would also be counted as a single delimiter by default.

Upvotes: 5

Llex
Llex

Reputation: 1770

You can use regexp to change single '_' (for example, change to '-') and then scan what you want:

data /*ULB.*/work.test;
                smart="4__J04__1__LE_18__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7";
                smartcr=prxchange("s/(?<=[^_])(_{1})(?=[^_])/-/",-1,smart);
                /* Maand splitsen in variablen */
                mois=scan(smartcr,1,"__");
                jours=scan(smartcr,2,"__");
                nbjours=scan(smartcr,3,"__");
                refClient=tranwrd(scan(smartcr,4,"__"),'-','_');
                paysPrestation=scan(smartcr,5,"__");
                wordingFR=scan(smartcr,6,"__");
                wordingNL=scan(smartcr,7,"__");
                fonction=scan(smartcr,8,"__");
                ARTISTIQUE2=scan(smartcr,9,"__");
                Art_At_LEAST=scan(smartcr,10,"__");
                totalBrut=scan(smartcr,11,"__");
                totalImposable=scan(smartcr,12,"__");
run;

Upvotes: 1

Related Questions