Bruce Mitchell
Bruce Mitchell

Reputation: 169

SPSS - How to isolate the maximum value in a variable so you can use it in syntax?

I have a table of areas with data. For a particular operation, I want to exclude the top and bottom 1% of areas, as they include extreme outliers.

Seems to me that a way forward is:

SORT CASES BY theVariableIwantToAnalyse  (A) .
NUMERIC id (F12.0) .  * create a casenum label "id"
COMPUTE id = $CASENUM. * populate it with casenum
EXECUTE.
NUMERIC idmax (F12.4) .   * create a variable to contain the highest value for "id" 
NUMERIC id1perc (F12.4) . * create a variable to contain 1% of the highest value for "id"  
COMPUTE idmax = MAX(id) .    * determine the highest value for id. This 'mock-syntax' line does not work.   
COMPUTE id1perc = idmax / 100 . * 1% of the highest value for "id"  
SELECT CASES WHERE ID >= id1perc or ID <= idmax - id1perc .

Draw graphs etc. I then need to

SORT CASES BY theNextVariableIwantToAnalyse  (A) .
COMPUTE id = $CASENUM. * populate it with the NEW casenum order
EXECUTE.

etc ...

Upvotes: 1

Views: 8923

Answers (3)

HelpingHand
HelpingHand

Reputation: 1

If the variable you are looking at is named MYVAR, then the following will produce a new variable (RMYVAR) with highest value=1 and next highest=2 and so on...

RANK MYVAR (D) /TIES=CONSENSE .

If you change the (D) to (A) then the lowest value will be 1. By using CONSENSE, it will rank in numeric order 1,2,3,4,etc. Otherwise, by using LOW or HIGH, it's like a race. If two people finish first, the next is 3rd and there is no 2nd, and so on.

Upvotes: 0

DocBuckets
DocBuckets

Reputation: 261

Try this to simply filter out the top and bottom 1% - just add FILTER BY filter. to turn off all extreme cases, or SELECT IF filter. ... EXECUTE. to delete them

EDIT: note that repeated values will be condensed by the RANK method (specifically the /TIESoption). This might not be ideal if you have the possibility of repeated values. Change the /TIES option if that's the case.

************* GENERATE RANDOM DATA *****************.
INPUT PROGRAM.
-       LOOP #I = 1 TO 1000.
-             COMPUTE Y = RV.NORMAL(100,10).
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

dataset name exampleData WINDOW=front.
EXECUTE.


************* RANK DATA  *************.
DATASET ACTIVATE exampleData.
RANK VARIABLES=Y (A)
  /RFRACTION INTO fractile
  /TIES=CONDENSE.

************* MAKE A FILTER  *************.
COMPUTE filter = (fractile>0.01 AND fractile < 0.99).
EXECUTE.

* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y filter MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y=col(source(s), name("Y"))
  DATA: filter=col(source(s), name("filter"), unit.category())
  GUIDE: axis(dim(1), label("Y"))
  GUIDE: axis(dim(2), label("Frequency"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("filter"))
  ELEMENT: interval.stack(position(summary.count(bin.rect(Y))), color.interior(filter), 
    shape.interior(shape.square))
END GPL.

Upvotes: 2

JKP
JKP

Reputation: 5417

A MUCH easier solution is just to use RANK and then select on the ranks you want to exclude.

Upvotes: 1

Related Questions