Jing Jing
Jing Jing

Reputation: 5

How to extracting all values that contain part of particular number and then deleting them?

How do you extract all values containing part of a particular number and then delete them? I have data where the ID contains different lengths and wants to extract all the IDs with a particular number. For example, if the ID contains either "-00" or "02" or "-01" at the end, pull to be able to see the hit rate that includes those—then delete them from the ID. Is there a more effecient way in creating this code?

I tried to use the substring function to slice it to get the result, but there is some other ID along with the specified position.

Code:

Proc sql;
Create table work.data1 AS
SELECT Product, Amount_sold, Price_per_unit,
CASE WHEN Product Contains "Pen" and Lenghth(ID) >= 9 Then ID = SUBSTR(ID,1,9)
WHEN Product Contains "Book" and Lenghth(ID) >= 11 Then ID = SUBSTR(ID,1,11)
WHEN Product Contains "Folder" and Lenghth(ID) >= 12 Then ID = SUBSTR(ID,1,12)
...
END AS ID
FROM A 
Quit;

Have:

+------------------+-----------------+-------------+----------------+
|     ID           |  Product        | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789        | Pen             |      30     |        2       |
| 63495837229-01   | Book            |      20     |        5       |
| ABC134475472 02  | Folder          |      29     |        7       |
| AB-1235674467-00 | Pencil          |      26     |        1       |
| 69598346-02      | Correction pen  |      15     |      1.50      |
| 6970457688       | Highlighter     |      15     |        2       |
| 584028467        | Color pencil    |      15     |       10       |
+------------------+-----------------+-------------+----------------+

Wanted the final result:

+------------------+-----------------+-------------+----------------+
|     ID           |  Product        | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789        | Pen             |      30     |        2       |
| 63495837229      | Book            |      20     |        5       |
| ABC134475472     | Folder          |      29     |        7       |
| AB-1235674467    | Pencil          |      26     |        1       |
| 69598346         | Correction pen  |      15     |      1.50      |
| 6970457688       | Highlighter     |      15     |        2       |
| 584028467        | Color pencil    |      15     |       10       |
+------------------+-----------------+-------------+----------------+

Upvotes: 0

Views: 32

Answers (2)

Tom
Tom

Reputation: 51621

Just test if the string has any embedded spaces or hyphens and also that the last word when delimited by space or hyphen is 00 or 01 or 02 then chop off the last three characters.

data have;
  infile cards dsd dlm='|' truncover ;
  input id :$20. product :$20. amount_sold price_per_unit;
cards;
123456789        | Pen             |      30     |        2       |
63495837229-01   | Book            |      20     |        5       |
ABC134475472 02  | Folder          |      29     |        7       |
AB-1235674467-00 | Pencil          |      26     |        1       |
69598346-02      | Correction pen  |      15     |      1.50      |
6970457688       | Highlighter     |      15     |        2       |
584028467        | Color pencil    |      15     |       10       |
;

data want;
  set have ;
  if indexc(trim(id),'- ') and scan(id,-1,'- ') in ('00' '01' '02') then
    id = substrn(id,1,length(id)-3)
  ;
run;

Result

                                          amount_     price_
Obs    id               product             sold     per_unit

 1     123456789        Pen                  30         2.0
 2     63495837229      Book                 20         5.0
 3     ABC134475472     Folder               29         7.0
 4     AB-1235674467    Pencil               26         1.0
 5     69598346         Correction pen       15         1.5
 6     6970457688       Highlighter          15         2.0
 7     584028467        Color pencil         15        10.0

Upvotes: 0

gregor
gregor

Reputation: 304

There may be other solutions but you have to use some string functions. I used here the functions substr, reverse (reverting the string) and indexc (position of one of the characters in the string):

data have;
input text $20.;
datalines;
12345678
AB-142353 00
AU-234343-02
132453 02
221344-09
;
run;

data want (drop=reverted pos);
set have;
  if countw(text) gt 1 
  then do;
    reverted=strip(reverse(text));
    pos=indexc(reverted,'- ')+1;
    new=strip(reverse(substr(reverted,pos)));
  end;
  else new=text;
run;

Upvotes: 0

Related Questions