user16368087
user16368087

Reputation: 63

Sort incomplete string of numeric values on Y axis in ggplot

I am trying to use participant record IDs on the y axis in my ggplot. The record IDs skip around (e.g. 1, 3, 10, 100). My question is three-fold:

  1. I'd like to display each ID on the y axis, but when I convert to as.numeric(as.character(record_id))), the axis is ordered but doesn't take into account that the record IDs skip around.

  2. If I convert to as.character, it's the right concept but I can't figure out how to sort so it doesn't appear as 1, 10, 100, 3, even when using str_order.

    So far, using ggplot(sincevax_reshape, aes(x=value, y=as.character(sort(as.numeric(record_id))))) has gotten me the look of the y axis but not the correct sort.

  3. Once I get the record IDs to be properly sorted on the Y axis, is there a way to increase the vertical spacing between each so the Y axis isn't so crowded/clustered?

     record_id  variable value
6           10    Sample  -182
7           11    Sample  -233
14          21    Sample  -189
16          23    Sample  -232
17          24    Sample  -214
21          30    Sample  -197
23          32    Sample  -133
24          33    Sample  -203
28          39    Sample  -165
29          41    Sample  -226
1105         3     Today   106
1106         4     Today   163
1107         6     Today    79
1108         7     Today   113
1109         9     Today   133
1110        10     Today   177
1111        11     Today   118

End goal is something like this without all the blank space at the top:

Upvotes: 3

Views: 343

Answers (3)

navona
navona

Reputation: 92

One option in the spirit of your first proposal might be to assign your nonconsecutive numeric IDs a rank based on their value, e.g.:

df$record_id_rank <- rank(df$record_id)

Note that rank will generate float values in the case of duplicate IDs, which it sounds like you may have in your long data. In that case, you can reduce ties to their integer:

df$record_id_rank <- floor(df$record_id_rank)

Then, you can plot df$record_id_rank on your y-axis as others have above. (If you want the axis labels to the real IDs, not the sequential numbering, I believe you can map within ggplot.)

Upvotes: 0

TarJae
TarJae

Reputation: 79311

Are you looking for this?

df %>% 
    ggplot(aes(x = factor(record_id), y = value)) +
    geom_col() +
    coord_flip()

enter image description here

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 174586

You could try converting the numbers to factors:

library(ggplot2)

df$record_id <- factor(df$record_id, levels = df$record_id)

ggplot(df, aes(x = value, y = record_id)) + 
  geom_col()

Created on 2021-08-17 by the reprex package (v2.0.0)

Data used

df <- structure(list(record_id = c(10L, 11L, 21L, 23L, 24L, 30L, 32L, 
33L, 39L, 41L), variable = c("Sample", "Sample", "Sample", "Sample", 
"Sample", "Sample", "Sample", "Sample", "Sample", "Sample"), 
    value = c(-182L, -233L, -189L, -232L, -214L, -197L, -133L, 
    -203L, -165L, -226L)), class = "data.frame", row.names = c("6", 
"7", "14", "16", "17", "21", "23", "24", "28", "29"))

Upvotes: 1

Related Questions