Create column sequence based on other columns in R

Question

I have data such as this.

        data.sample <- read_table2('score_label treatment   score   data1   data2   data3
A   treatment   1   1   t   yt
A   treatment   2   1   t   yt
A   treatment   3   5   f   yt
B   treatment   1   5   f   yt
B   treatment   2   5   f   yt
B   treatment   3   5.5 g   yt
B   treatment   4   6.8 t   yt
C   treatment   1   9.4 t   yt
C   treatment   2   10.7    f   yt
C   treatment   3   12  j   yt
C   treatment   4   13.3    t   yt
C   control 1   14.6    t   yt
C   control 3   18.5    k   yt
C   control 4   19.8    t   yt')

I would like to create df such as this. Where every score label-treatment group, has a score running from 1-4 and where 0 is populated into the cells where this score was not present previously.

output<- read_table2('score_label   treatment   score   data1   data2   data3
A   treatment   1   1   t   yt
A   treatment   2   1   t   yt
A   treatment   3   5   f   yt
A   treatment   4   0   0   0
B   treatment   1   5   f   yt
B   treatment   2   5   f   yt
B   treatment   3   5.5 g   yt
B   treatment   4   6.8 t   yt
C   treatment   1   9.4 t   yt
C   treatment   2   10.7    f   yt
C   treatment   3   12  j   yt
C   treatment   4   13.3    t   yt
C   control 1   14.6    t   yt
C   control 2   0   0   0
C   control 3   18.5    k   yt
C   control 4   19.8    t   yt')

I thought of doing this to create a new score column, but it's not working how I hoped it would. Any suggestions appreciated!!

data.sample %>%
group_by(score_lable, treatment) %>%
mutate(new_score=seq(4))

akrun · Accepted Answer

We can use complete with fill

library(dplyr)
library(tidyr)
data.sample %>% 
    group_by(score_label, treatment) %>% 
    complete(score = unique(data.sample$score),
          fill = list(data1 = 0, data2 = 0, data3 = '0'))

If there are many columns to fill, it can be constructed as a list

nm1 <- names(data.sample)[startsWith(names(data.sample), 'data')]
fillcols <- setNames(rep(list(0), length(nm1)), nm1)
data.sample %>% 
  group_by(score_label, treatment) %>% 
  complete(score = unique(data.sample$score), fill = fillcols)

Create column sequence based on other columns in R

Answers (1)

Related Questions