tubaguy
tubaguy

Reputation: 149

Sum Char Columns until Count Reached

I have a table that looks something like this (with more records and more notes):

+------+---------------+------+---------------+------+---------------+------+
|id    |note_1         |len_1 |note_2         |len_2 |note_3         |len_3 |  
+------+---------------+------+---------------+------+---------------+------+
|10001 |"abcde"        |5     |"abc"          |3     |"abcdefg"      |7     |  
|10002 |"defghijk"     |8     |"ghuio"        |5     |"yuio"         |4     | 
|10003 |"abc"          |3     |"defg"         |4     |"qw"           |2     | 
+------+---------------+------+---------------+------+---------------+------+

and I'd like to create a do loop in a SAS data step that concatenates all of the notes until a certain length is reached (in this example, length of 10). Here's the ideal column that would be created for this example, given a max of 10:

+------+--------------+
|id    |concat_notes  |
+------+--------------+
|10001 |"abcdeabcab"  |
|10002 |"defghijkgh"  |
|10003 |"abcdefgqw"   |
+------+--------------+

Here's the code I'm trying to create:

data length;
set notes;
concats = "";
do i=1 to 3;
    if (vvaluex(cats("len_",i)) > 10) then concat_notes= concats;
    else concats = cats(concats,vvaluex(cats("note_",i)));
end;
run;

Note: In reality, all the notes are very long and my max length is 32767. I can't concatenate them all and use substrn to take the first 32,767 because of space issues.

Upvotes: 1

Views: 143

Answers (2)

Richard
Richard

Reputation: 27518

The stated actual max length is 32,767, which is also the max length of SAS character variables. Thus, you might want a blithe concatenation of the desired variables and let normal truncation occur if the result would exceed 32K characters.

Robust code will use a length statement to specify how much space to allocate for the variable storing the resultant.

set notes;
length notes_catted $32767;
notes_catted = cats (of note_:);

Non-robust

set notes;
notes_catted = cats (of note_:);  * variable will be given implicit default length $200;

When length is not specified the DATA Step compiler will choose to create a variable of length $200 instead. From the help file:

Length of Returned Variable

In a DATA step, if the CATS function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. Additionally (my italics), If the concatenation operator (||) returns a value to a variable that has not previously been assigned a length, then that variable is given a length that is the sum of the lengths of the values that are being concatenated.

Upvotes: 6

Joe
Joe

Reputation: 63434

Seems like you can just check the lengths directly:

%let max_length=10;
data have;
input id note_1 $ len_1 note_2 $ len_2 note_3 $ len_3;
datalines;
  10001    abcde           5        abc             3        abcdefg         7         
  10002    defghijk        8        ghuio           5        yuio            4        
  10003    abc             3        defg            4        qw              2        
;;;;
run;
data want;
  set have;
  array notes note_:;
  length cat_note $10;
  do _i = 1 to dim(notes);
    if length(cat_note) + length(notes[_i]) le &max_length. then 
        cat_note = cats(cat_note,notes[_i]);
    else if length(cat_note) lt &max_length. then 
        cat_note = cats(cat_note, substr(notes[_i],1,(&max_length.-length(cat_note))));  *added to get last bit;
  end;
  keep id cat_note;
run;

You can just directly add the length also, if there's a reason you don't want to use the length function to check the lengths, but this seems fine performance-wise... don't go through that vvaluex stuff though, that doesn't really make sense. Just create a summation variable and every time you concat something, add to it.

Remember length() for character variables returns the length up to the last non-space character, so this accurately reflects what you're looking for.

Edited to add in the leftover to get to 10 exactly.

Upvotes: 2

Related Questions