jamborta
jamborta

Reputation: 5210

ggplot geom_bar vs geom_histogram

What is the difference (if any) between geom_bar and geom_histogram in ggplot? They seem to produce the same plot and take the same parameters.

Upvotes: 21

Views: 21503

Answers (3)

SpeedKarma
SpeedKarma

Reputation: 11

geom_bar() is for both x and y-values are categorical data -- so there are spaces between two bars as x-values are factor with distinct levels.

geom_histogram() is for one continuous data and one categorical data. Usually we put the continuous data to the x-axis (so the bars are touching each other as they are continuous) and categorical data to the y-axis.

There is another plot we can use to show the above situation (1 categorical 1 continuous) -- geom_boxplot(). Usually we use y-axis to represent the continuous data as it's going to be a vertical box-and-whisker.

Upvotes: 1

yahiaelgamal
yahiaelgamal

Reputation: 174

The default behavior is the same from both geom_bar and geom_histogram. This is because (and as @csgillespie mentioned), there is an implied stat_bin when you call geom_histogarm (understandable), and it is also the default statistics transformation applied to geom_bar (arguable behavior IMO). That's why you need to specify stat='identity' when you want the to plot the data as is.

The stat='bin' or stat_bin() is a statistical transformation that ggplot does for you. It provides you as output the variables surrounded with two dots (the ..count.. and ..density... If you don't specify stat='bin' you won't get those variables.

Upvotes: 3

csgillespie
csgillespie

Reputation: 60492

  • Bar charts provide a visual presentation of categorical data. Examples:
    • The number of people with red, black and brown hair
    • Look at the geom_bar help file. The examples are all counts.
    • Wikipedia page
  • Histograms are used to plot density of interval (usually numeric) data. Examples,
    • Distributions of age and height
    • geom_hist help file. The examples are distribution of movie ratings.

ggplot2

After a bit more investigating, I think in ggplot2 there is no difference between geom_bar and geom_histogram. From the docs:

 geom_histogram(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)
 geom_bar(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)

I realise that in the geom_histogram docs it states:

geom_histogram is an alias for geom_bar plus stat_bin

but to be honest, I'm not really sure what this means, since my understanding of ggplot2 is that both stat_bin and geom_bar are layers (with a slightly different emphasis).

Upvotes: 21

Related Questions