Logan Liao
Logan Liao

Reputation: 11

R ggplot too many categorical labels

So, I have this plot that has too many groups pretty much and would like to space them out. It looks pretty bad as the labels are all bunched up. Is there a way to put like spacing or padding between my graph?

sea_abnb %>%
  group_by(neighbourhood) %>%
  summarize(neigh_med = median(price)) %>%
  ggplot(aes(x=reorder(neighbourhood, neigh_med), y=neigh_med)) +
    labs(x="Room type", y="Median Price per Night") +
    geom_point() +
    coord_flip() 

My Plot Here

Upvotes: 1

Views: 1443

Answers (2)

desval
desval

Reputation: 2435

Note that when you save you can set the dimensions. You can stretch it out by increasing the height. It is probably better to have the labels on the longer axis of your plot. However, this type of plot might not be the best way to present your data.

ggsave(
  filename="my_plot.png",
  width = 400,
  height = 800,
  units = c("in", "cm", "mm"),
)

One possibility would be to split the data into two panels, and maybe add bars to improve the looks:

enter image description here

d <- data.frame(neighbourhood = paste0("ID",1:70), neigh_med = runif(70, 0, 100))

d <- d %>% mutate(t = ifelse(neigh_med <= median(neigh_med), "l", "h" ))

ggplot(d, aes(x=reorder(neighbourhood, neigh_med), y=neigh_med)) +
  labs(x="Room type", y="Median Price per Night") +
  geom_bar(stat = "identity", width=0.2) +
  geom_point()+
  coord_flip() +
  facet_wrap(~t, scales = "free_y" ) +
  theme(
    strip.background = element_blank(),
    strip.text.x = element_blank()
  )

Finally, if some labels are much longer than others, you can abbreviate them with substr

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24790

@Allan Cameron is right, a map would be a great way to plot this data.

I've been meaning to give rgdal a try, and this question gave me an excuse.

Let's plot median listing price by zipcode.

library(rgdal)
library(ggplot2)
library(dplyr)
library(stringr)
library(broom)
library(data.table)
sea_abnb <- fread("http://data.insideairbnb.com/united-states/wa/seattle/2020-03-17/data/listings.csv.gz")

temp <- tempfile()
temp2 <- tempfile()
download.file("https://opendata.arcgis.com/datasets/83fc2e72903343aabff6de8cb445b81c_2.zip",temp)
unzip(zipfile = temp, exdir = temp2)
seat.shp <- readOGR(temp2, stringsAsFactors = F)
seat.shp@data$id <- rownames(seat.shp@data)
seat.points <- tidy(seat.shp, region="id") %>% left_join(seat.shp@data, by="id")
seat.df <- sea_abnb %>%
  mutate(ZIPCODE = str_extract(zipcode,"^[0-9]{5}"),
         price = as.numeric(str_replace(price,"\\$",""))) %>%
  group_by(ZIPCODE) %>%
  summarize(zip_med = median(price,na.rm=TRUE)) %>% 
  dplyr::select(ZIPCODE, zip_med) %>% 
  right_join(seat.points)

ggplot() + geom_polygon(data = seat.df, aes(x = long, y = lat, group = ZIPCODE, fill = zip_med), colour = "black") +
  coord_map(xlim= c(-122.5,-122.05), ylim = c(47.45,47.8)) + labs(y= "Lattitude", x = "Longitude") + 
  guides(fill=guide_legend(title="Median Zipcode\nListing Price"))

enter image description here

Upvotes: 1

Related Questions