G. Sozu
G. Sozu

Reputation: 13

mix of histogram and bar chart in ggplot2 or plotly like it is possible with hist()

I have created with hist() function a mix of histogram and bar chart. Picture and code below.

And now I want to do something similar with ggplot2 or plotly, because I want to have such a plot in a shiny app as interactive plot. After many hours I don’t found a solution how to do it.

On the x axis of my plot I have the temperature and on the y axis I have the sum of people which live in the range of the temperature. And above each bin I have also real sum of people for each bin. As it is possible that some people are multiple times listed in the same bins, therefor I also have the sum, let me say, of the “unique” people.

This how it looks with hist().

As always, any help is appreciated.

# create df
mydf <- data.frame(
  City.as.ID=c("Hønefoss : Norwegen", "Hønefoss : Norwegen", "Hønefoss : Norwegen", "Hønefoss : Norwegen", "Hønefoss : Norwegen", "Hønefoss : Norwegen",
               "Jessheim : Norwegen","Jessheim : Norwegen", "Jessheim : Norwegen", "Jessheim : Norwegen", "Jessheim : Norwegen", "Jessheim : Norwegen",
               "Hanko : Finnland","Hanko : Finnland","Hanko : Finnland","Hanko : Finnland","Hanko : Finnland", "Hanko : Finnland", 
               "Espoo : Finnland","Espoo : Finnland","Espoo : Finnland","Espoo : Finnland","Espoo : Finnland","Espoo : Finnland"),
  peoplefreq=c(1,1,1,1,1,1,
               3,3,3,3,3,3,
               18,18,18,18,18,18,
               2,2,2,2,2,2),

  temperature=c(-4.93, -3.55, 0.82, 3.7, 10.18,13.41,
                -1.92, -2.6, 2.19, 4.04, 10.75, 14.18,
                -2.39, -2.54, 0.78, 2.39, 9.22, 13.41,
                -2.86, -3.51, 0.12, 2.06, 9.16, 13.35),
  row_id=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
)
mydf

# sorting the temperature column
mydf <- mydf[order(mydf$temperature),]
mydf

# from here all the work for plot
mydata <- mydf
mx <- mydata$temperature
my <- mydata$peoplefreq
mc <- mydata$City.as.ID

# get the data from hist()
h <- hist(mydata$temperature, plot = FALSE)

# get the breakpionts
breaks <- data.frame(
  "start"=h$breaks[-length(h$breaks)], 
  "end"=h$breaks[-1]
)
breaks

# sum up the y values within the x bins
sums_of_y_within_x_bins <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })
sums_of_y_within_x_bins

# sums instead of frequency
h$counts <- sums_of_y_within_x_bins

# sum up the unique values of y within the x bins
# in between temperature -5 to 0 there are total 48 peoples but some of them are multiple times listed
# in real there are only 24 people
uniqvalues_of_y <- apply(breaks, MARGIN=1, FUN=function(x) {
  newdata <- unique(subset(mydata, select = c(City.as.ID, peoplefreq)))
  sum(newdata$peoplefreq[is.element(newdata$City.as.ID, as.vector(unique(mc[ mx >= x[1] & mx < x[2] ])))])
})
uniqvalues_of_y

uniqvalues_of_y <- as.character(uniqvalues_of_y)

# the final plot as a mix of histogram and bar chart
plot(h, labels = uniqvalues_of_y , ylab="Total sum of y", col="gray")

# some try
library(ggplot2)

#here it counts how many values are within the x bin but not the sum
ggplot(mydata, aes(x=mx, fill=my)) + 
  geom_histogram(breaks=c(-5,0,5,10,15), color="black")

Upvotes: 1

Views: 225

Answers (1)

Jack Brookes
Jack Brookes

Reputation: 3830

I think the graph isn't very clear and maybe you should have a different approach, but if you want to do this you can manually bin the data:

library(dplyr)
library(ggplot2)

mydf %>%
  mutate(temperature_group = cut(temperature, seq(-5, 15, by = 5))) %>%
  group_by(temperature_group, City.as.ID) %>%
  summarise(sum_peoplefreq = sum(peoplefreq), unique_people = first(peoplefreq)) %>%
  summarise_at(vars(sum_peoplefreq, unique_people), "sum") %>% 
  ggplot(aes(x = temperature_group, y = sum_peoplefreq, label = unique_people)) +
  geom_col(fill = "grey80", color = "black") + 
  geom_text(nudge_y = 2) + 
  theme_classic()

enter image description here

Upvotes: 0

Related Questions