Reputation: 13
I want to add a "Week" column to my dataset. This week column runs from week 1 - week 5. I want to add this to a long data set (12) rows. Is there a way I can code to make this function reoccurring to the entire data set to it looks like:
ID | Week | Distance |
---|---|---|
Name 1 | Week 1 | 1000 |
Name 1 | Week 2 | 1500 |
Name 1 | Week 3 | 1100 |
Name 1 | Week 4 | 900 |
Name 1 | Week 5 | 1400 |
Name 2 | Week 1 | 1300 |
Name 2 | Week 2 | 1050 |
Name 2 | Week 3 | 1500 |
Name 2 | Week 4 | 1190 |
Name 2 | Week 5 | 950 |
Name 3 | Week 1 | 1350 |
Name 3 | Week 2 | 1200 |
I have tried doing:
df %>% mutate(week = c(1:5))
However, I come up with this error:
week
must be size 12 or 1, not 5.
Upvotes: 1
Views: 63
Reputation: 73572
Using rep_len
.
> transform(d, week=rep_len(1:5, nrow(d)))
ID Distance week
1 Name 1 1000 1
2 Name 1 1500 2
3 Name 1 1100 3
4 Name 1 900 4
5 Name 1 1400 5
6 Name 2 1300 1
7 Name 2 1050 2
8 Name 2 1500 3
9 Name 2 1190 4
10 Name 2 950 5
11 Name 3 1350 1
12 Name 3 1200 2
or
> transform(d, week=paste('Week', rep_len(1:5, nrow(d))))
ID Distance week
1 Name 1 1000 Week 1
2 Name 1 1500 Week 2
3 Name 1 1100 Week 3
4 Name 1 900 Week 4
5 Name 1 1400 Week 5
6 Name 2 1300 Week 1
7 Name 2 1050 Week 2
8 Name 2 1500 Week 3
9 Name 2 1190 Week 4
10 Name 2 950 Week 5
11 Name 3 1350 Week 1
12 Name 3 1200 Week 2
Note, that we can recycle if n %% nrow(d) == 0
.
> t(nrow(d) %% 1:12 == 0)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
E.g. n == 6
:
> transform(d, week=1:6)
ID Distance week
1 Name 1 1000 1
2 Name 1 1500 2
3 Name 1 1100 3
4 Name 1 900 4
5 Name 1 1400 5
6 Name 2 1300 6
7 Name 2 1050 1
8 Name 2 1500 2
9 Name 2 1190 3
10 Name 2 950 4
11 Name 3 1350 5
12 Name 3 1200 6
*Data:*
> dput(d)
structure(list(ID = c("Name 1", "Name 1", "Name 1", "Name 1",
"Name 1", "Name 2", "Name 2", "Name 2", "Name 2", "Name 2", "Name 3",
"Name 3"), Distance = c(1000L, 1500L, 1100L, 900L, 1400L, 1300L,
1050L, 1500L, 1190L, 950L, 1350L, 1200L), week = c(1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(NA, -12L), class = "data.frame")
Upvotes: 0
Reputation: 698
If you just need to repeat "Week 1" to "Week 5" until the data frame runs out, you could use sapply
to a seq
representing the row numbers:
Sample data frame:
df=data.frame(ID=c(rep('Name 1',5), rep('Name 2',5), rep('Name 3', 2)),
distance=rnorm(12))
The following function just repeats the numbers 1-5 via a mod operation on the row numbers
sapply(1:dim(df)[1], FUN=function(x) ((x-1) %% 5)+1)
Thus
paste0("Week ",sapply(1:dim(df)[1], FUN=function(x) ((x-1) %% 5)+1))
produces the column you are after.
NB: This method will not work if the rows are not sorted and/or there are varying numbers of weeks per ID.
Upvotes: 0
Reputation: 155
c(1:5)
is equivalent to c(1,2,3,4,5)
. Basically the error message is telling you that you are trying to add a new column of length 5 to a dataframe with 12 rows. Something like this should work:
mutate(df, week = rep(c(1:5), length.out = nrow(df)))
By wrapping the vector c(1:5)
in rep we repeat it until we get a vector of length specified in the length.out
argument which we can set to the nrow(df)
which is the number of rows in the dataframe.
Upvotes: 2