asangoi
asangoi

Reputation: 53

How to convert a bar histogram into a line histogram in R

I've seen many examples of a density plot but the density plot's y-axis is the probability. What I am looking for a is a line plot (like a density plot) but the y-axis should contain counts (like a histogram).

I can do this in excel where I manually make the bins and the frequencies and make a bar histogram and then I can change the chart type to a line - but can't find anything similar in R.

I've checked out both base and ggplot2; yet can't seem to find an answer. I understand that histograms are meant to be bars but I think representing them as a continuous line makes more visual sense.

Upvotes: 5

Views: 23829

Answers (5)

DorinPopescu
DorinPopescu

Reputation: 725

There is a very simple and fast way for count data.

First let's generate some dummy count data:

my.count.data = rpois(n = 10000, lambda = 3)

And then the plotting command (assuming you have called library(magrittr)):

my.count.data %>% table %>% plot

Upvotes: 0

M. Olaru
M. Olaru

Reputation: 51

This is an old question, but I thought it might be helpful to post a solution that specifically addresses your question.

In ggplot2, you can plot a histogram and display the count with bars using:

ggplot(data) +  
geom_histogram()

You can also plot a histogram and display the count with lines using a frequency polygon:

ggplot(data) + 
geom_freqpoly()

For more info -- ggplot2 reference

Upvotes: 4

eafpres
eafpres

Reputation: 171

Although this is old, I thought the following might be useful. Let's say you have a data set of 10,000 points, and you believe they belong to a certain distribution, and you would like to plot the histogram of the actual data and the line of the probability density of the ideal distribution on top of it.

noise <- 2
#
# the noise is tagged onto the end using runif
# just do demo issues w/real data and fitting
# the subtraction causes the data to have some
# negative values, which must be addressed in 
# the fit later on
#
noisylognorm <- rlnorm(10000, 
                        mean = 0.25, 
                        sd = 1) + 
                        (noise * runif(10000) - noise / 10)
#
# using package fitdistrplus
#
# subset is used to remove the negative values
# as the lognormal distribution needs positive only
#
fitlnorm <- fitdist(subset(noisylognorm, 
                           noisylognorm > 0),
                           "lnorm")
fitlnorm_density <- density(rlnorm(10000, 
                                   mean = fitlnorm$estimate[1],
                                   sd = fitlnorm$estimate[2]))
hist(subset(noisylognorm, 
            noisylognorm < 25),
     breaks = seq(-1, 25, 0.5),
     col = "lightblue",
     xlim = c(0, 25),
     xlab = "value",
     ylab = "frequency",
     main = paste0("Log Normal Distribution\n",
                   "noise = ", noise))

lines(fitlnorm_density$x, 
      10000 * fitlnorm_density$y * 0.5,
      type="l",
      col = "red")

Note the * 0.5 in the lines function. As far as I can tell, this is necessary to account for the width of the hist() bars.

Upvotes: 0

CnrL
CnrL

Reputation: 2589

Using default R graphics (i.e. without installing ggplot) you can do the following, which might also make what the density function does a bit clearer:

# Generate some data
data=rnorm(1000)
# Get the density estimate
dens=density(data)
# Plot y-values scaled by number of observations against x values
plot(dens$x,length(data)*dens$y,type="l",xlab="Value",ylab="Count estimate")

Upvotes: 8

Richie Cotton
Richie Cotton

Reputation: 121077

To adapt the example on the ?stat_density help page:

m <- ggplot(movies, aes(x = rating))
# Standard density plot.
m + geom_density()
# Density plot with y-axis scaled to counts.
m + geom_density(aes(y = ..count..))

Upvotes: 0

Related Questions