Reputation: 172
Consider the following data:
As you can see, the values of the variable are inherently numeric, but include text in some of them. I have tried every permutation I could think of do repeat...end repeat
to try and remove the non-numeric values and leave just the numbers, without success.
Is there some syntax that will do it? Is there a function that checks whether a substr contains any of a set of characters? Then I could create a set that represents all the digits, loop through each character in the string, and if it is not in the set, replace it with a null.
Upvotes: 1
Views: 1816
Reputation: 3166
This answer on IBM support answers a somewhat similar question: https://www.ibm.com/support/pages/removing-unwanted-characters-strings
You will have a lot more characters to search (the whole a-z, A-Z and probably some non-letter characters as well), but it should work.
You might also want to use the newer, CHAR.INDEX
and CHAR.REPLACE
functions, if you are using SPSS 223 or newer; see the official IBM SPSS documentation on them:
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/syn_transformation_expressions_string_functions.html
Later Edit (after clarifications and suggestions from the OP:
What you need to adjust in the IBM examples is 2 things:
hardcode the loop exit after k iterations (not when #I=0 - that will stop at the first character it does not find). In the below example, k is set to 100.
specify all characters you want to remove: a to z, space, quotation (as 2 consecutive quotation signs), and so on; anything you think you might want to clean. Then this should work (and indeed stackoverflow, formatting does not seem to be working properly at the moment)
COMPUTE x=LOWER(x).
LOOP k=1 to CHAR.LENGTH(x).
COMPUTE #I = CHAR.INDEX(X,'abcdefghijklmnopqrstuvwxyz+, ''',1).
IF #I > 0 X=CONCAT(CHAR.SUBSTR(X,1,#I-1), CHAR.SUBSTR(X,#I+1)).
END LOOP.
EXECUTE.
Upvotes: 2