Angelo
Angelo

Reputation: 5059

Binning decimal data and plotting an histogram

I have a dataset in a file, which looks like this:

0.0707526823
0.4859753978
0.0084166789
0.0694709558
0.0156410467
0.3783259831
0.8977261856
0.7981824881
0.2079852045
0.9498437264
0.9264972044
0.1878358734
0.0020816686
0.0024611297
0.4250464895
0.0725748666
0.0407962054
0.8282363221
0.8408343333
0.7129760016
0.2772250135
0.3677588953
0.4723908637
0.9452814318

I want to bin this data with an interval of 0.1 and plot a histogram.

I did try using R,

and here what I was doing

x<-read.table("filex", header=T)
breaks=seq (min, max, step)
hist (x$col1, breaks)

but this command is not working in my case :(

Any one liner in awk, or R is welcomed

Thank you

Upvotes: 0

Views: 5882

Answers (1)

Blue Magister
Blue Magister

Reputation: 13363

It looks like you need to better specify breaks with something like min(x) and max(x).

x <- read.table(textConnection("
    0.0707526823
    0.4859753978
    0.0084166789
    0.0694709558
    0.0156410467
    0.3783259831
    0.8977261856
    0.7981824881
    0.2079852045
    0.9498437264
    0.9264972044
    0.1878358734
    0.0020816686
    0.0024611297
    0.4250464895
    0.0725748666
    0.0407962054
    0.8282363221
    0.8408343333
    0.7129760016
    0.2772250135
    0.3677588953
    0.4723908637
    0.9452814318
"))

# extract vector of numeric from current data frame
x <- x$V1

# create breaks for frequency
# need to add a padding factor to make things equally spaced
step <- .1
pad <- step - ((max(x) - min(x)) %% step)/2
breaks <- seq(min(x) - pad, max(x) + pad,by=.1)

# alternative (only good for exact decimal increments):
# use floor and ceiling
breaks <- floor(min(x)*10):ceiling(max(x)*10)/10

# create histogram
# equally spaced breaks create frequency chart automatically
hist(x,breaks)

Upvotes: 3

Related Questions