Marc-Marijn
Marc-Marijn

Reputation: 151

(Amateur) data manipulation in R: variable levels as new column with its values from another column

I would like to create a new column which extracts 'b' and 'c' from type and puts the value in that column. There are other random columns in between as well that should be kept which I just named random.


id <- c("1", "1", "1", "1","2", "2", "2", "2", "3", "3", "3", "3")
type <- c("a", "a", "b", "c", "a", "a", "b", "c", "a", "a", "b", "c")
random <- c("random")
value <- c("1", "2", "50", "100", "4", "5", "55", "110", "2.5", "3", "53", "105")

df <- data.frame(id, type, random, value)

  id type random value
1   1    a random     1
2   1    a random     2
3   1    b random    50
4   1    c random   100
5   2    a random     4
6   2    a random     5
7   2    b random    55
8   2    c random   110
9   3    a random   2.5
10  3    a random     3
11  3    b random    53
12  3    c random   105

What I would like to have is:

  id2 type2 random value2  b   c
1   1     a random      1 50 100
2   1     a random      2 50 100
3   2     a random      4 55 110
4   2     a random      5 55 110
5   3     a random    2.5 53 105
6   3     a random      3 53 105

Would very much appreciate some thoughts! Best,

Upvotes: 1

Views: 43

Answers (1)

akrun
akrun

Reputation: 887851

We could group by 'id', create the 'b' column by extracting the 'value' where the 'type' is 'b' (assuming single value of 'b' per group), ungroup and remove the row where 'type' is 'b'

library(dplyr)
df %>%
    group_by(id) %>%
    mutate(b = value[type == 'b']) %>% 
    ungroup %>% 
    filter(type != 'b')
# A tibble: 4 x 5
#  id    type  keepthis value b    
#  <fct> <fct> <fct>    <fct> <fct>
#1 1     a     keep     4     95   
#2 1     a     keep     5     95   
#3 2     a     keep     3     94   
#4 2     a     keep     5     94  

Update

Based on the updated dataset, we could filter the 'type' with 'b', 'c', do a reshape to 'wide' format with pivot_wider, left_join with the original dataset filtered with only 'type' 'a'

library(tidyr)
df %>%
     filter(type %in% c('b', 'c'))  %>% 
     pivot_wider(names_from = type, values_from = value) %>%
     left_join(df %>% 
             filter(type  == 'a'))
# A tibble: 6 x 6
#  id    random b     c     type  value
#* <fct> <fct>  <fct> <fct> <fct> <fct>
#1 1     random 50    100   a     1    
#2 1     random 50    100   a     2    
#3 2     random 55    110   a     4    
#4 2     random 55    110   a     5    
#5 3     random 53    105   a     2.5  
#6 3     random 53    105   a     3    

With updated dataset, the code would be

df %>%
    filter(scale_id %in% c('IM', 'RT')) %>% 
    select(Title, Task, task_id, scale_id, data_value) %>% 
    pivot_wider(names_from = scale_id, values_from  = data_value) %>% 
    left_join(df %>% 
                  filter(! scale_id %in% c('IM', 'RT')) %>% 
                  group_by(task_id) %>%
                  slice(which.max(data_value)))

Upvotes: 1

Related Questions