rookiesportsanalyst
rookiesportsanalyst

Reputation: 13

Adding a column to summarize week

I want to add a "Week" column to my dataset. This week column runs from week 1 - week 5. I want to add this to a long data set (12) rows. Is there a way I can code to make this function reoccurring to the entire data set to it looks like:

ID Week Distance
Name 1 Week 1 1000
Name 1 Week 2 1500
Name 1 Week 3 1100
Name 1 Week 4 900
Name 1 Week 5 1400
Name 2 Week 1 1300
Name 2 Week 2 1050
Name 2 Week 3 1500
Name 2 Week 4 1190
Name 2 Week 5 950
Name 3 Week 1 1350
Name 3 Week 2 1200

I have tried doing:

df %>% mutate(week = c(1:5))

However, I come up with this error: week must be size 12 or 1, not 5.

Upvotes: 1

Views: 63

Answers (3)

jay.sf
jay.sf

Reputation: 73572

Using rep_len.

> transform(d, week=rep_len(1:5, nrow(d)))
       ID Distance week
1  Name 1     1000    1
2  Name 1     1500    2
3  Name 1     1100    3
4  Name 1      900    4
5  Name 1     1400    5
6  Name 2     1300    1
7  Name 2     1050    2
8  Name 2     1500    3
9  Name 2     1190    4
10 Name 2      950    5
11 Name 3     1350    1
12 Name 3     1200    2

or

> transform(d, week=paste('Week', rep_len(1:5, nrow(d))))
       ID Distance   week
1  Name 1     1000 Week 1
2  Name 1     1500 Week 2
3  Name 1     1100 Week 3
4  Name 1      900 Week 4
5  Name 1     1400 Week 5
6  Name 2     1300 Week 1
7  Name 2     1050 Week 2
8  Name 2     1500 Week 3
9  Name 2     1190 Week 4
10 Name 2      950 Week 5
11 Name 3     1350 Week 1
12 Name 3     1200 Week 2

Note, that we can recycle if n %% nrow(d) == 0.

> t(nrow(d) %% 1:12 == 0)
     [,1] [,2] [,3] [,4]  [,5] [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
[1,] TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE  TRUE

E.g. n == 6:

> transform(d, week=1:6)
       ID Distance week
1  Name 1     1000    1
2  Name 1     1500    2
3  Name 1     1100    3
4  Name 1      900    4
5  Name 1     1400    5
6  Name 2     1300    6
7  Name 2     1050    1
8  Name 2     1500    2
9  Name 2     1190    3
10 Name 2      950    4
11 Name 3     1350    5
12 Name 3     1200    6

*Data:*

> dput(d)
structure(list(ID = c("Name 1", "Name 1", "Name 1", "Name 1", 
"Name 1", "Name 2", "Name 2", "Name 2", "Name 2", "Name 2", "Name 3", 
"Name 3"), Distance = c(1000L, 1500L, 1100L, 900L, 1400L, 1300L, 
1050L, 1500L, 1190L, 950L, 1350L, 1200L), week = c(1L, 2L, 3L, 
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(NA, -12L), class = "data.frame")

Upvotes: 0

njp
njp

Reputation: 698

If you just need to repeat "Week 1" to "Week 5" until the data frame runs out, you could use sapply to a seq representing the row numbers:

Sample data frame:

df=data.frame(ID=c(rep('Name 1',5), rep('Name 2',5), rep('Name 3', 2)),
              distance=rnorm(12))

The following function just repeats the numbers 1-5 via a mod operation on the row numbers

sapply(1:dim(df)[1], FUN=function(x) ((x-1) %% 5)+1)

Thus

paste0("Week ",sapply(1:dim(df)[1], FUN=function(x) ((x-1) %% 5)+1))

produces the column you are after.

NB: This method will not work if the rows are not sorted and/or there are varying numbers of weeks per ID.

Upvotes: 0

Adam
Adam

Reputation: 155

c(1:5) is equivalent to c(1,2,3,4,5). Basically the error message is telling you that you are trying to add a new column of length 5 to a dataframe with 12 rows. Something like this should work:

mutate(df, week = rep(c(1:5), length.out = nrow(df)))

By wrapping the vector c(1:5) in rep we repeat it until we get a vector of length specified in the length.out argument which we can set to the nrow(df) which is the number of rows in the dataframe.

Upvotes: 2

Related Questions