Reputation: 29
I am creating a dotplot
fat <- (airlines$fat1 + airlines$fat2)
ggplot(airlines, aes(y = airline, x = fat)) + geom_point(stat = "identity")
but the result is very messy so I would like to order it according to the fat
variable in ascending order.
I have tried:
airlines1 <- data.frame(airline = rownames(airlines), fat, row.names = NULL)
airlines2 <- factor(airlines1$airline,
levels = airlines[order(airlines$fat),"fat"])
ggplot(airlines2, aes(y = airline, x = fat)) +
geom_point(stat = "identity")
But I get two errors:
"Error: Column `fat` not found"
&
"Error: ggplot2 doesn't know how to deal with data of class factor"
How should I order it?
This is the data I am analysing:
dput(airlines)
structure(list(airline = c("Aer Lingus", "Aeroflot*", "Aerolineas Argentinas",
"Aeromexico*", "Air Canada", "Air France", "Air India*", "Air New Zealand*",
"Alaska Airlines*", "Alitalia", "All Nippon Airways", "American*",
"Austrian Airlines", "Avianca", "British Airways*", "Cathay Pacific*",
"China Airlines", "Condor", "COPA", "Delta / Northwest*", "Egyptair",
"El Al", "Ethiopian Airlines", "Finnair", "Garuda Indonesia",
"Gulf Air", "Hawaiian Airlines", "Iberia", "Japan Airlines",
"Kenya Airways", "KLM*", "Korean Air", "LAN Airlines", "Lufthansa*",
"Malaysia Airlines", "Pakistan International", "Philippine Airlines",
"Qantas*", "Royal Air Maroc", "SAS*", "Saudi Arabian", "Singapore Airlines",
"South African", "Southwest Airlines", "Sri Lankan / AirLanka",
"SWISS*", "TACA", "TAM", "TAP - Air Portugal", "Thai Airways",
"Turkish Airlines", "United / Continental*", "US Airways / America West*",
"Vietnam Airlines", "Virgin Atlantic", "Xiamen Airlines"), avseatkm = c(320906734,
1197672318, 385803648, 596871813, 1865253802, 3004002661, 869253552,
710174817, 965346773, 698012498, 1841234177, 5228357340, 358239823,
396922563, 3179760952, 2582459303, 813216487, 417982610, 550491507,
6525658894, 557699891, 335448023, 488560643, 506464950, 613356665,
301379762, 493877795, 1173203126, 1574217531, 277414794, 1874561773,
1734522605, 1001965891, 3426529504, 1039171244, 348563137, 413007158,
1917428984, 295705339, 682971852, 859673901, 2376857805, 651502442,
3276525770, 325582976, 792601299, 259373346, 1509195646, 619130754,
1702802250, 1946098294, 7139291291, 2455687887, 625084918, 1005248585,
430462962), inc1 = c(2, 76, 6, 3, 2, 14, 2, 3, 5, 7, 3, 21, 1,
5, 4, 0, 12, 2, 3, 24, 8, 1, 25, 1, 10, 1, 0, 4, 3, 2, 7, 12,
3, 6, 3, 8, 7, 1, 5, 5, 7, 2, 2, 1, 2, 2, 3, 8, 0, 8, 8, 19,
16, 7, 1, 9), fatacc1 = c(0, 14, 0, 1, 0, 4, 1, 0, 0, 2, 1, 5,
0, 3, 0, 0, 6, 1, 1, 12, 3, 1, 5, 0, 3, 0, 0, 1, 1, 0, 1, 5,
2, 1, 1, 3, 4, 0, 3, 0, 2, 2, 1, 0, 1, 1, 1, 3, 0, 4, 3, 8, 7,
3, 0, 1), fat1 = c(0, 128, 0, 64, 0, 79, 329, 0, 0, 50, 1, 101,
0, 323, 0, 0, 535, 16, 47, 407, 282, 4, 167, 0, 260, 0, 0, 148,
520, 0, 3, 425, 21, 2, 34, 234, 74, 0, 51, 0, 313, 6, 159, 0,
14, 229, 3, 98, 0, 308, 64, 319, 224, 171, 0, 82), inc2 = c(0,
6, 1, 5, 2, 6, 4, 5, 5, 4, 7, 17, 1, 0, 6, 2, 2, 0, 0, 24, 4,
1, 5, 0, 4, 3, 1, 5, 0, 2, 1, 1, 0, 3, 3, 10, 2, 5, 3, 6, 11,
2, 1, 8, 4, 3, 1, 7, 0, 2, 8, 14, 11, 1, 0, 2), fatacc2 = c(0,
1, 0, 0, 0, 2, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0,
2, 0, 2, 1, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 0), fat2 = c(0, 88, 0, 0,
0, 337, 158, 7, 88, 0, 0, 416, 0, 0, 0, 0, 225, 0, 0, 51, 14,
0, 92, 0, 22, 143, 0, 0, 0, 283, 0, 0, 0, 0, 537, 46, 1, 0, 0,
110, 0, 83, 0, 0, 0, 0, 3, 188, 0, 1, 84, 109, 23, 0, 0, 0),
model = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
"41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
"51", "52", "53", "54", "55", "56")), .Names = c("airline",
"avseatkm", "inc1", "fatacc1", "fat1", "inc2", "fatacc2", "fat2",
"model"), row.names = c(NA, -56L), class = c("tbl_df", "tbl",
"data.frame"))
Upvotes: 2
Views: 178
Reputation: 42544
There is no reason to load dplyr
or to manipulate otherwise the underlying airlines
data.frame. This can be done concisely in the call to aes()
on-the-fly:
library(ggplot2)
ggplot(airlines, aes(y = reorder(airline, fat1 + fat2), x = fat1 + fat2)) +
geom_point() + xlab("Fatalities") + ylab(NULL)
The call to reorder()
coerces airline
to factor where the factor levels are ordered by increasing value of fat1 + fat2
.
For dealing with factors, I find Hadley Wickham's forcats
package very useful. The fct_reorder()
has a .desc
parameter which can be used to reverse the order of factor levels explicitely:
ggplot(airlines, aes(y = forcats::fct_reorder(airline, fat1 + fat2, .desc = TRUE),
x = fat1 + fat2)) +
geom_point() + xlab("Fatalities") + ylab(NULL)
I personally find this more transparent than to multiply the X
parameter of base R's reorder()
function by -1
, e.g., reorder(airline, -(fat1 + fat2))
. However, your mileage may vary.
Upvotes: 0
Reputation: 4768
With dplyr
and ggplot2
:
library(ggplot2)
library(dplyr)
airlines %>%
select(airline, fat1, fat2) %>%
mutate(fat = fat1 + fat2) %>%
ggplot(aes(fat, reorder(airline, fat))) +
geom_point(stat = "identity") +
labs(y = "airline", x = "fatalities")
If you want the order reversed, you can modify fat
to -fat
:
airlines %>%
select(airline, fat1, fat2) %>%
mutate(fat = fat1 + fat2) %>%
ggplot(aes(fat, reorder(airline, -fat))) +
geom_point(stat = "identity") +
labs(y = "airline", x = "fatalities")
Upvotes: 2