Reputation: 5
How do you extract all values containing part of a particular number and then delete them? I have data where the ID contains different lengths and wants to extract all the IDs with a particular number. For example, if the ID contains either "-00" or "02" or "-01" at the end, pull to be able to see the hit rate that includes those—then delete them from the ID. Is there a more effecient way in creating this code?
I tried to use the substring function to slice it to get the result, but there is some other ID along with the specified position.
Code:
Proc sql;
Create table work.data1 AS
SELECT Product, Amount_sold, Price_per_unit,
CASE WHEN Product Contains "Pen" and Lenghth(ID) >= 9 Then ID = SUBSTR(ID,1,9)
WHEN Product Contains "Book" and Lenghth(ID) >= 11 Then ID = SUBSTR(ID,1,11)
WHEN Product Contains "Folder" and Lenghth(ID) >= 12 Then ID = SUBSTR(ID,1,12)
...
END AS ID
FROM A
Quit;
Have:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229-01 | Book | 20 | 5 |
| ABC134475472 02 | Folder | 29 | 7 |
| AB-1235674467-00 | Pencil | 26 | 1 |
| 69598346-02 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Wanted the final result:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229 | Book | 20 | 5 |
| ABC134475472 | Folder | 29 | 7 |
| AB-1235674467 | Pencil | 26 | 1 |
| 69598346 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Upvotes: 0
Views: 32
Reputation: 51621
Just test if the string has any embedded spaces or hyphens and also that the last word when delimited by space or hyphen is 00 or 01 or 02 then chop off the last three characters.
data have;
infile cards dsd dlm='|' truncover ;
input id :$20. product :$20. amount_sold price_per_unit;
cards;
123456789 | Pen | 30 | 2 |
63495837229-01 | Book | 20 | 5 |
ABC134475472 02 | Folder | 29 | 7 |
AB-1235674467-00 | Pencil | 26 | 1 |
69598346-02 | Correction pen | 15 | 1.50 |
6970457688 | Highlighter | 15 | 2 |
584028467 | Color pencil | 15 | 10 |
;
data want;
set have ;
if indexc(trim(id),'- ') and scan(id,-1,'- ') in ('00' '01' '02') then
id = substrn(id,1,length(id)-3)
;
run;
Result
amount_ price_
Obs id product sold per_unit
1 123456789 Pen 30 2.0
2 63495837229 Book 20 5.0
3 ABC134475472 Folder 29 7.0
4 AB-1235674467 Pencil 26 1.0
5 69598346 Correction pen 15 1.5
6 6970457688 Highlighter 15 2.0
7 584028467 Color pencil 15 10.0
Upvotes: 0
Reputation: 304
There may be other solutions but you have to use some string functions. I used here the functions substr, reverse (reverting the string) and indexc (position of one of the characters in the string):
data have;
input text $20.;
datalines;
12345678
AB-142353 00
AU-234343-02
132453 02
221344-09
;
run;
data want (drop=reverted pos);
set have;
if countw(text) gt 1
then do;
reverted=strip(reverse(text));
pos=indexc(reverted,'- ')+1;
new=strip(reverse(substr(reverted,pos)));
end;
else new=text;
run;
Upvotes: 0