Reputation: 149
I have a table that looks something like this (with more records and more notes):
+------+---------------+------+---------------+------+---------------+------+
|id |note_1 |len_1 |note_2 |len_2 |note_3 |len_3 |
+------+---------------+------+---------------+------+---------------+------+
|10001 |"abcde" |5 |"abc" |3 |"abcdefg" |7 |
|10002 |"defghijk" |8 |"ghuio" |5 |"yuio" |4 |
|10003 |"abc" |3 |"defg" |4 |"qw" |2 |
+------+---------------+------+---------------+------+---------------+------+
and I'd like to create a do loop in a SAS data
step that concatenates all of the notes until a certain length is reached (in this example, length of 10). Here's the ideal column that would be created for this example, given a max of 10:
+------+--------------+
|id |concat_notes |
+------+--------------+
|10001 |"abcdeabcab" |
|10002 |"defghijkgh" |
|10003 |"abcdefgqw" |
+------+--------------+
Here's the code I'm trying to create:
data length;
set notes;
concats = "";
do i=1 to 3;
if (vvaluex(cats("len_",i)) > 10) then concat_notes= concats;
else concats = cats(concats,vvaluex(cats("note_",i)));
end;
run;
Note: In reality, all the notes are very long and my max length is 32767. I can't concatenate them all and use substrn
to take the first 32,767 because of space issues.
Upvotes: 1
Views: 143
Reputation: 27518
The stated actual max length is 32,767, which is also the max length of SAS character variables. Thus, you might want a blithe concatenation of the desired variables and let normal truncation occur if the result would exceed 32K characters.
Robust code will use a length
statement to specify how much space to allocate for the variable storing the resultant.
set notes;
length notes_catted $32767;
notes_catted = cats (of note_:);
Non-robust
set notes;
notes_catted = cats (of note_:); * variable will be given implicit default length $200;
When length is not specified the DATA Step compiler will choose to create a variable of length $200 instead. From the help file:
Length of Returned Variable
In a DATA step, if the CATS function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. Additionally (my italics), If the concatenation operator (||) returns a value to a variable that has not previously been assigned a length, then that variable is given a length that is the sum of the lengths of the values that are being concatenated.
Upvotes: 6
Reputation: 63434
Seems like you can just check the lengths directly:
%let max_length=10;
data have;
input id note_1 $ len_1 note_2 $ len_2 note_3 $ len_3;
datalines;
10001 abcde 5 abc 3 abcdefg 7
10002 defghijk 8 ghuio 5 yuio 4
10003 abc 3 defg 4 qw 2
;;;;
run;
data want;
set have;
array notes note_:;
length cat_note $10;
do _i = 1 to dim(notes);
if length(cat_note) + length(notes[_i]) le &max_length. then
cat_note = cats(cat_note,notes[_i]);
else if length(cat_note) lt &max_length. then
cat_note = cats(cat_note, substr(notes[_i],1,(&max_length.-length(cat_note)))); *added to get last bit;
end;
keep id cat_note;
run;
You can just directly add the length also, if there's a reason you don't want to use the length
function to check the lengths, but this seems fine performance-wise... don't go through that vvaluex stuff though, that doesn't really make sense. Just create a summation variable and every time you concat something, add to it.
Remember length() for character variables returns the length up to the last non-space character, so this accurately reflects what you're looking for.
Edited to add in the leftover to get to 10 exactly.
Upvotes: 2