E B
E B

Reputation: 1073

R Using cut function on dates defined as Number and format of the breaks

I have a dataframe that has Dates and Runtimes

DF  = data.frame(Year =  c(1800,1892,1910,2000,2004),Runtimes=c(80,10,15,10,30))
DF

  Year Runtimes
  1 1800       80
  2 1892       10
  3 1910       15
  4 2000       10
  5 2004       30

I am using CUT to create breaks by 10 based on the range of year I have . And then plotting this frequency distribution in ggplot. What I notice is that when I did the CUT, the values of year since it is defined as a NUM got represented in a NUMBER form and not like a 4-CHAR Year.

Is there a way to preserve the yr in a more readable format like [1890,1900) instead of the number format so that the information is more readable?

Here is the code that I have been playing with:

yr_bins = seq(1800,2010,10)
rt_yr = cut(yr,breaks=yr_bins,right=FALSE)
yr_freq_table = transform(table(rt_yr))
yr_freq_table
ggplot(yr_freq_table) + 
      geom_bar(aes(x=rt_yr,y=Freq), fill="lightblue",color="lightslategray",
     position="stack",stat="identity",ylab("Count Year (mins)") + 
     scale_x_discrete(drop=F) + theme(axis.text.x=element_text(angle=90,   
     vjust=.5, hjust=1)) + ggtitle("Runtime Distribution")   

Sample data is below:

             rt_yr Freq

1   [1.8e+03,1.81e+03)    1
2  [1.81e+03,1.82e+03)    0
3  [1.82e+03,1.83e+03)    0

UPDATE: The issue that I am tring to solve is to be able to represent the information in ggplot with the rt_yr not being numeric but in ranges of 10

Upvotes: 1

Views: 887

Answers (2)

eipi10
eipi10

Reputation: 93761

You can use the dig.lab argument in the cut function to prevent scientific notation. For example:

rt_yr = cut(DF$Year, breaks=yr_bins, right=FALSE, dig.lab=4)

ggplot(yr_freq_table) + 
  geom_bar(aes(x=rt_yr, y=Freq), fill="lightblue", color="lightslategray", 
           stat="identity") +
  labs(y="Count Year (mins)") + 
  scale_x_discrete(drop=F) + 
  theme(axis.text.x=element_text(angle=90, vjust=.5, hjust=1)) + 
  ggtitle("Runtime Distribution")

enter image description here

If you want the labels formatted a specific way, you can also set the labels yourself using the labels argument. For example, let's say we prefer a hyphen separator instead of a comma:

rt_yr = cut(DF$Year,breaks=yr_bins, 
        labels=paste0("[", yr_bins[-length(yr_bins)], "-", yr_bins[-1], ")"),
        right=FALSE)

enter image description here

Upvotes: 2

AidanGawronski
AidanGawronski

Reputation: 2085

I like to use this method:

yr_freq_table$bottom <- 
    as.numeric(gsub("[[](.*),(.*)[)]", "\\1", yr_freq_table$rt_yr))

yr_freq_table$top <- 
    as.numeric(gsub("[[](.*),(.*)[)]", "\\2", yr_freq_table$rt_yr))

head(yr_freq_table)

                rt_yr Freq bottom  top
1  [1.8e+03,1.81e+03)    1   1800 1810
2 [1.81e+03,1.82e+03)    0   1810 1820
3 [1.82e+03,1.83e+03)    0   1820 1830
4 [1.83e+03,1.84e+03)    0   1830 1840
5 [1.84e+03,1.85e+03)    0   1840 1850
6 [1.85e+03,1.86e+03)    0   1850 1860

Upvotes: 0

Related Questions