Reputation: 1078
I am trying to make 41 bins for a variable.
suppose this is the variable for 2014
d<-runif(10300, -4.740, 6.142)
d<-as.data.frame(d)
Now what I want is to know the range for each bin
So the first bin should be like
min(d$d)
-< min(d$d)+(max(d$d)-min(d$d))/41-1
and so on... till you get the 41 bins.
I was doing this, but is taking way too long.
bins<-seq(min(d$d), max(d$d), by=(max(d$d)-min(d$d))/41-1`)
d$bins<- ifelse(d$d<bins[2], paste(bins[1], bins[2], sep="-<"),
ifelse(d$d<=bins[2] & d$d <bins[3], paste(bins[2], bins[3], sep="-",
another nested ifelse)))
And so on, but is taking a lot of time
So in the dataframe d
i'd like to have:
Is there a function to do that or a faster way to do it?
Thanks in advance!
what i am trying to do with this is to make a barplot in ggplot that on the x axis are the bin ranges and in Y a count of how many obs. there are within that bin.
Upvotes: 0
Views: 622
Reputation: 2262
Another version using my santoku package:
library(santoku)
d$bin <- chop_evenly(d$d, 41)
d$bin_num <- as.numeric(d$bin)
Upvotes: 0
Reputation: 388982
You can use cut
and specify the number of breaks that you want.
library(dplyr)
library(ggplot2)
d %>%
mutate(bin_range = cut(d, 41),
bin_num = as.numeric(bin_range)) %>% head
# d bin_range bin_num
#1 0.8337735 (0.834,1.1] 22
#2 -3.2143150 (-3.41,-3.15] 6
#3 3.2491203 (3.22,3.49] 31
#4 -3.7117195 (-3.94,-3.68] 4
#5 -0.4843214 (-0.493,-0.228] 17
#6 -4.0540989 (-4.21,-3.94] 3
To plot, you can use ggplot2
d %>%
mutate(bin_range = cut(d, 41),
bin_num = as.numeric(bin_range)) %>%
count(bin_range) %>%
ggplot(aes(bin_range, n)) +
geom_col() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Upvotes: 2