Reputation: 10604
I'd like to make a tile plot from a data set I have of event occurrences by year. For an example, I have data something like this:
set.seed(123)
data <- data.frame(years = sample(2000:2010, 50, replace = T))
I'd like to plot these as a tile plot with x = year, but maintain separation (y direction) between events in the years in which multiples occur. The problem is that I have no other column to give me a consecutive y value for year multiples.
To illustrate, I have this:
data[data$years == 2002, ]
[1] 2002 2002 2002 2002
And I think I need something like this:
data[data$years == 2002, ]
years index
1 2002 1
2 2002 2
3 2002 3
4 2002 4
Then I could tile with x = years
and y = index
.
Thanks for any suggestions!
Upvotes: 2
Views: 116
Reputation: 193517
In the spirit of sharing here is another way to do this in base R:
stack(with(data, by(years, years, FUN = seq_along)))
Here are the first few lines:
> head(stack(with(data, by(years, years, FUN = seq_along))), 10)
values ind
1 1 2000
2 2 2000
3 3 2000
4 1 2001
5 2 2001
6 3 2001
7 4 2001
8 5 2001
9 1 2002
10 2 2002
For that matter, any of the split-apply-combine approaches would probably be appropriate, such as these:
stack(sapply(split(data$years, data), seq_along))
stack(tapply(data$years, data$years, FUN = seq_along))
However, the ave
solution from @Arun and the "plyr" solution from @juba would be much more appropriate for adding columns to a multi-column dataset than these, if only because of their flexibility.
Upvotes: 1
Reputation: 17189
May not be most elegant.. Just adding another way of doing it.
set.seed(123)
data <- data.frame(years = sample(2000:2010, 50, replace = T))
cbind(data[order(data), ], unlist(sapply(rle(data[order(data), ])$lengths, FUN = seq)))
## [,1] [,2]
## [1,] 2000 1
## [2,] 2000 2
## [3,] 2000 3
## [4,] 2001 1
## [5,] 2001 2
## [6,] 2001 3
## [7,] 2001 4
## [8,] 2001 5
## [9,] 2002 1
## [10,] 2002 2
## [11,] 2002 3
## [12,] 2002 4
## [13,] 2002 5
## [14,] 2003 1
## [15,] 2003 2
## [16,] 2003 3
## [17,] 2003 4
## [18,] 2004 1
## [19,] 2004 2
## [20,] 2004 3
## [21,] 2004 4
## [22,] 2004 5
## [23,] 2005 1
## [24,] 2005 2
## [25,] 2005 3
## [26,] 2005 4
## [27,] 2005 5
## [28,] 2006 1
## [29,] 2006 2
## [30,] 2006 3
## [31,] 2007 1
## [32,] 2007 2
## [33,] 2007 3
## [34,] 2007 4
## [35,] 2007 5
## [36,] 2007 6
## [37,] 2008 1
## [38,] 2008 2
## [39,] 2008 3
## [40,] 2009 1
## [41,] 2009 2
## [42,] 2009 3
## [43,] 2009 4
## [44,] 2009 5
## [45,] 2009 6
## [46,] 2010 1
## [47,] 2010 2
## [48,] 2010 3
## [49,] 2010 4
## [50,] 2010 5
**As per Arun's suggestion following is even simpler.
cbind(data[order(data), ],sequence(rle(data[order(data), ])$lengths))
Upvotes: 2
Reputation: 118779
I'd first sort the data by years
and use ave
as follows:
set.seed(123)
data <- data.frame(years = sample(2000:2010, 50, replace = T))
data <- data[order(data$years), , drop = F]
data$index <- ave(data$years, data$years, FUN=seq_along)
# a piece of output
# years index
# 6 2000 1
# 18 2000 2
# 35 2000 3
# 15 2001 1
# 30 2001 2
# 41 2001 3
# 45 2001 4
# 46 2001 5
# 17 2002 1
# 38 2002 2
# 40 2002 3
# 47 2002 4
# 49 2002 5
Edit: You can also do it without sorting with ave
by just skipping the first line that sorts as:
set.seed(123)
data <- data.frame(years = sample(2000:2010, 50, replace = T))
data$index <- ave(data$years, data$years, FUN=seq_along)
> head(data)
# years index
# 1 2003 1
# 2 2008 1
# 3 2004 1
# 4 2009 1
# 5 2010 1
# 6 2000 1
Note that now the order is preserved. Now if we subset for 2002
:
data[data$years == 2002, ]
# years index
# 17 2002 1
# 38 2002 2
# 40 2002 3
# 47 2002 4
# 49 2002 5
Upvotes: 3
Reputation: 49033
Maybe with plyr
:
ddply(data, .(years), mutate, index=1:length(years))
Which gives :
years index
1 2000 1
2 2000 2
3 2000 3
4 2001 1
5 2001 2
6 2001 3
7 2001 4
8 2001 5
9 2002 1
10 2002 2
11 2002 3
12 2002 4
13 2002 5
Upvotes: 5