JulietteC
JulietteC

Reputation: 159

Columns merge in a same column according a dictionary

I have this data.frame :

data.frame(identifiant = paste0("I",c(1:6)),
           project1 = c(1,NA,3,NA,NA,NA),project2 = c(NA,NA,NA,4,5,NA),project3 = c(NA,NA,NA,NA,NA,6),
           subject1 = c("A","B",NA,NA,NA,NA),subject2 = c(NA,NA,"C","D","E",NA),subject3 = c(NA,NA,NA,NA,NA,"F")) 

This data :

   identifiant project1 project2 project3 subject1 subject2 subject3
1          I1        1       NA       NA        A     <NA>     <NA>
2          I2       NA       NA       NA        B     <NA>     <NA>
3          I3        3       NA       NA     <NA>        C     <NA>
4          I4       NA        4       NA     <NA>        D     <NA>
5          I5       NA        5       NA     <NA>        E     <NA>
6          I6       NA       NA        6     <NA>     <NA>        F

And I would like this data :

   identifiant project subject
1          I1       1       A
2          I2      NA       B
3          I3       3       C
4          I4       4       D
5          I5       5       E
6          I6       6       F

So a columns merging but according a 'data dictionary' :

dictionary <- data.frame(old = c("identifiant","project1", "project2", "project3","subject1", "subject2", "subject3"),
                          new = c("identifiant","project","project","project","subject","subject","subject")) 

          old         new
1 identifiant identifiant
2    project1     project
3    project2     project
4    project3     project
5    subject1     subject
6    subject2     subject
7    subject3     subject

Does anyone have a solution without going into something very complicated?

Upvotes: 1

Views: 67

Answers (2)

akrun
akrun

Reputation: 887048

Another option is coalesce to apply on the two sets of columns ('project', 'subject')

library(dplyr)
df1 %>% 
   mutate_at(vars(starts_with('subject')), as.character)%>% 
   transmute(identifiant, 
             project = coalesce(!!! select(., starts_with('project'))), 
             subject = coalesce(!!! select(., starts_with('subject'))))
#  identifiant project subject
#1          I1       1       A
#2          I2       2       B
#3          I3       3       C
#4          I4       4       D
#5          I5       5       E
#6          I6       6       F

Or if we need to use the 'dictionary' dataset, split it to a list based on the 'new' values

library(purrr)
dictionary %>%
     group_split(new, keep = FALSE)  %>% 
     map_dfc(~ .x %>% 
                 pull(old) %>% 
                 as.character(.x) %>% 
                 select(df1, .) %>%
                 mutate_all(as.character) %>% 
                 transmute(new = coalesce(!!! .))) %>% 
     set_names(unique(dictionary$new))

Or if we use a named list, we can make use of imap in assignment :=

dictionary %>%
    {split(as.character(.$old), .$new)} %>% 
    imap_dfc(~ select(df1, .x) %>% 
           mutate_all(as.character) %>%
           transmute(!! .y := coalesce(!!! .)))
#  identifiant project subject
#1          I1       1       A
#2          I2       2       B
#3          I3       3       C
#4          I4       4       D
#5          I5       5       E
#6          I6       6       F

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13309

With dplyr, we could do:

df %>% 
   tidyr::pivot_longer(cols= contains("project"),
                       names_to="proj",
                       values_to = "vals") %>% 
   filter(!is.na(vals)) %>% 
   tidyr::pivot_longer(cols=contains("subject"),
                       names_to = "subj",
                       values_to = "other_val") %>% 
   na.omit() %>% 
  rename(project=vals, subject=other_val) %>% 
   select(-proj,-subj)
# A tibble: 6 x 3
  identifiant project subject
  <fct>         <dbl> <fct>  
1 I1                1 A      
2 I2                2 B      
3 I3                3 C      
4 I4                4 D      
5 I5                5 E      
6 I6                6 F 

Upvotes: 1

Related Questions