Reputation: 1073
I have a dataframe that has Dates and Runtimes
DF = data.frame(Year = c(1800,1892,1910,2000,2004),Runtimes=c(80,10,15,10,30))
DF
Year Runtimes
1 1800 80
2 1892 10
3 1910 15
4 2000 10
5 2004 30
I am using CUT to create breaks by 10 based on the range of year I have . And then plotting this frequency distribution in ggplot. What I notice is that when I did the CUT, the values of year since it is defined as a NUM got represented in a NUMBER form and not like a 4-CHAR Year.
Is there a way to preserve the yr in a more readable format like [1890,1900) instead of the number format so that the information is more readable?
Here is the code that I have been playing with:
yr_bins = seq(1800,2010,10)
rt_yr = cut(yr,breaks=yr_bins,right=FALSE)
yr_freq_table = transform(table(rt_yr))
yr_freq_table
ggplot(yr_freq_table) +
geom_bar(aes(x=rt_yr,y=Freq), fill="lightblue",color="lightslategray",
position="stack",stat="identity",ylab("Count Year (mins)") +
scale_x_discrete(drop=F) + theme(axis.text.x=element_text(angle=90,
vjust=.5, hjust=1)) + ggtitle("Runtime Distribution")
Sample data is below:
rt_yr Freq
1 [1.8e+03,1.81e+03) 1
2 [1.81e+03,1.82e+03) 0
3 [1.82e+03,1.83e+03) 0
UPDATE: The issue that I am tring to solve is to be able to represent the information in ggplot with the rt_yr not being numeric but in ranges of 10
Upvotes: 1
Views: 887
Reputation: 93761
You can use the dig.lab
argument in the cut
function to prevent scientific notation. For example:
rt_yr = cut(DF$Year, breaks=yr_bins, right=FALSE, dig.lab=4)
ggplot(yr_freq_table) +
geom_bar(aes(x=rt_yr, y=Freq), fill="lightblue", color="lightslategray",
stat="identity") +
labs(y="Count Year (mins)") +
scale_x_discrete(drop=F) +
theme(axis.text.x=element_text(angle=90, vjust=.5, hjust=1)) +
ggtitle("Runtime Distribution")
If you want the labels formatted a specific way, you can also set the labels yourself using the labels
argument. For example, let's say we prefer a hyphen separator instead of a comma:
rt_yr = cut(DF$Year,breaks=yr_bins,
labels=paste0("[", yr_bins[-length(yr_bins)], "-", yr_bins[-1], ")"),
right=FALSE)
Upvotes: 2
Reputation: 2085
I like to use this method:
yr_freq_table$bottom <-
as.numeric(gsub("[[](.*),(.*)[)]", "\\1", yr_freq_table$rt_yr))
yr_freq_table$top <-
as.numeric(gsub("[[](.*),(.*)[)]", "\\2", yr_freq_table$rt_yr))
head(yr_freq_table)
rt_yr Freq bottom top
1 [1.8e+03,1.81e+03) 1 1800 1810
2 [1.81e+03,1.82e+03) 0 1810 1820
3 [1.82e+03,1.83e+03) 0 1820 1830
4 [1.83e+03,1.84e+03) 0 1830 1840
5 [1.84e+03,1.85e+03) 0 1840 1850
6 [1.85e+03,1.86e+03) 0 1850 1860
Upvotes: 0