thebatman
thebatman

Reputation: 5

Sort by variable in twoway scatter. X-axis stays alphabetical and sort produces gibberish: why?

I have two variables:

ie_ctotal
cntry2

Note: cntry2 is an encoded version of a string variable cntry: I don't know if this may be affecting things.

I want a twoway scatter of ie_ctotal and cntry2, and I want to SORT this scatter by another variable gdppc,

 twoway  || scatter ie_ctotal cntry2, c(1) xlabel(,valuelabel)

The above without sort works fine. Once I introduce sort, however,

 twoway  || scatter ie_ctotal cntry2, c(1) sort(gdppc) xlabel(,valuelabel)

The graph turns gibberish, or rather it connects according to the sort, but the x axis remains alphabetical, making the connections seem scribbled.

Any ideas as to what I am doing wrong?

Note: I don't want to sort the original data, because I was advised in previous questions that this is a bad idea. So I want to sort the data only for this one graph.

Upvotes: 0

Views: 6318

Answers (2)

Nick Cox
Nick Cox

Reputation: 37258

There is no reproducible example here, and not even a graph, but it is possible to guess the problem.

You are typing above

c(1) 

which is ill-advised, although Stata does the right thing. It would be better to type

c(l)

which instructs Stata to join data points on your graph in a line. (Nod to @Dimitriy V. Masterov on this detail.)

In your first example, the values of cntry2 define the x axis.

As you say, the effect of sort(gdppc) is to connect points in order of their values from lowest gdppc to highest. The result is clearly not what you want.

Here is a dopey reproducible example that makes the point.

. sysuse auto, clear
(1978 Automobile Data)

. scatter mpg weight, sort(price) c(l)

enter image description here

You want to sort the countries into gdppc order. This is like sorting make in Stata's auto data according to mpg, but then plotting weight. Here I do this just for foreign cars. It's not a very good graph, but it sounds close in spirit to what you want. This solution requires installation of labmask, for which search labmask and then download from the Stata Journal website.

sysuse auto, clear
keep if foreign
sort mpg 
gen obsno = _n 
labmask obsno, values(make) 
scatter weight obsno, xla(1/22, valuelabel ang(v) noticks) xtitle("") 

enter image description here

In a nutshell: the sort() option here defines a connection order; it doesn't map the x axis variable to a reshuffled version. That you need to do before the graphics.

UPDATE In fact, you can get essentially the same graph without any prior manipulation:

graph dot (asis) weight if foreign, over(make, sort(mpg) label(ang(v)))  vertical linetype(line) lines(lc(none))

This is going along with the OP's interest in putting labelled categories on the x axis. A graph easier to read would put them on the y axis: then text can be read left to right. To get that, omit the vertical above: that is the default for graph dot. Although the command above omits guide lines by setting their colour to none, very thin light colour guide lines can help.

Upvotes: 2

dimitriy
dimitriy

Reputation: 9470

This uses the trick of encoding using the order of another variable to get the sorting right:

sysuse auto, clear
keep if foreign==1

sencode make, gen(encoded_make) gsort(-weight)
levelsof encoded_make, local(labels)
tw scatter price encoded_make, mlabel(weight) c(1) xlabel(`labels', value angle(45)) sort(weight)

You will need to install sencode from SSC.

Upvotes: 1

Related Questions