Andreas
Andreas

Reputation: 85

Histogram plot in R

I am looking for some guiding regarding histogram plot.

Lets assume I have this vecotr (called CF)

  [,1]    
 [1,] 2275.351
 [2,] 2269.562 
 [3,] 1925.700 
 [4,] 1904.195 
 [5,] 1974.039     

I use the following formula to plot this vector in a histogram plot.

hist(CF)

Let us now assume I have 10 000 simulated value estimates for a property. I want to plot those in a histogram (or similar plots) where the x-axis returns the probabilities.

Such plot will give med the opportunity to state something like: "with 55% probability, the value of the property exceeds $15 million.

Suggerstions?

Upvotes: 2

Views: 515

Answers (2)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

I agree with @Stibu that you want the CDF. When you are talking about a set of realized data, we refer to this as the empirical cumulative distribution function (ECDF). In R, the basic function call for this is ?ecdf:

CF <- read.table(text="[1,] 2275.351
[2,] 2269.562 
[3,] 1925.700 
[4,] 1904.195 
[5,] 1974.039", header=F)
CF <- as.vector(CF[,-1])
CF  # [1] 2275.351 2269.562 1925.700 1904.195 1974.039
windows()
  plot(ecdf(CF))

enter image description here

If you are willing to download the fitdistrplus package, there are a lot of fancy versions you can play with:

library(fitdistrplus)
windows()
  plotdist(CF)

enter image description here

fdn <- fitdist(CF, "norm")
fdw <- fitdist(CF, "weibull")
summary(fdw)
# Fitting of the distribution ' weibull ' by maximum likelihood 
# Parameters : 
#         estimate Std. Error
# shape   13.59732   4.833605
# scale 2149.24253  74.958140
# Loglikelihood:  -32.89089   AIC:  69.78178   BIC:  69.00065 
# Correlation matrix:
#           shape     scale
# shape 1.0000000 0.3328979
# scale 0.3328979 1.0000000
windows()
  plot(fdn)

enter code here

windows()
  cdfcomp(list(fdn,fdw), legendtext=c("Normal","Weibull"), lwd=2)

enter image description here

Upvotes: 2

Stibu
Stibu

Reputation: 15897

What you probably want is the cumulative distribution function (CDF). It has probability on the y-axis (not x, as you asked), but since this is the standard way to represent the information that you want, it is best to use this curve.

As an example, I produced 10'000 values with a standard normal distribution and then constructed the CDF:

CF <- rnorm(10000)
breaks <- seq(-4,4,0.5)
CDF <- sapply(breaks,function(b) sum(CF<=b)/length(CF))
plot(breaks,CDF,type="l")

From the plot, you can for instance read off that with probability of 50%, a value below zero has been drawn.

If you prefer a bar plot, you can plot with

barplot(CDF,names.arg=breaks)

I don't know your data in detail, so I can not give you more precise code. But basically, you will have to pick a reasonable set of breaks, and then apply the code above.

Upvotes: 5

Related Questions