Reputation: 11381
Consider this x
set of dates:
set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
x <- strftime(x, '%Y')
The following is a distribution of the years of those dates:
> table(x)
x
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994
4 4 3 3 6 4 3 4 5 12 1 1 1 2
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
9 4 2 1 4 4 2 1 4 1 4 3 4 3
2010
1
Now say I want to group them by decade. For this, I use the cut
function:
> table(cut(x, seq(1980, 2010, 10)))
Error in cut.default(x, seq(1980, 2010, 10)) : 'x' must be numeric
Ok, so let's force x
to numeric:
> table(cut(as.numeric(x), seq(1980, 2010, 10)))
(1.98e+03,1.99e+03] (1.99e+03,2e+03] (2e+03,2.01e+03]
45 28 23
Now, as you can see, the row.names
of that table are in scientific format. How do I force them to not be in scientific notation? I've tried wrapping that whole command above inside format
, formatC
and prettyNum
, but all those do is format the frequencies.
Upvotes: 1
Views: 522
Reputation: 193657
This doesn't exactly answer the question you asked, but shows you a possible alternative: use the fact that there is a cut.Date
method:
set.seed(1234)
x <- sample(1980:2010, 100, replace = T)
x <- strptime(x, '%Y')
out <- table(cut(x, "10 years"))
out
#
# 1980-01-01 1990-01-01 2000-01-01 2010-01-01
# 48 25 26 1
Here, we also get what I would consider the "correct" values for each bin.
As a crude justification of my statement about "correct" values, consider the values we get when we manually calculate based on table
:
y <- strftime(x, '%Y')
Tab <- table(y)
Tab
# y
# 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1994 1995 1996
# 4 4 3 3 6 4 3 4 5 12 1 1 1 2 9 4
# 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010
# 2 1 4 4 2 1 4 1 4 3 4 3 1
sum(Tab[grepl("198", names(Tab))])
# [1] 48
sum(Tab[grepl("199", names(Tab))])
# [1] 25
sum(Tab[grepl("200", names(Tab))])
# [1] 26
sum(Tab[grepl("201", names(Tab))])
# [1] 1
Upvotes: 1
Reputation: 11381
Thanks joran for pointing the path to the answer. I'll elaborate it here for the record:
Changing cut
's dig.lab
parameter from the default 3 to 4 solved this particular mockup as well as my real problem:
> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4))
(1980,1990] (1990,2000] (2000,2010]
45 28 23
By the way, in order for 1980 to be counted one should include the include.lowest
argument:
> table(cut(as.numeric(x), seq(1980, 2010, 10), dig.lab = 4, include.lowest = T))
[1980,1990] (1990,2000] (2000,2010]
49 28 23
Now it sums to 100! :)
Upvotes: 3