Reputation: 159
I have this data.frame :
data.frame(identifiant = paste0("I",c(1:6)),
project1 = c(1,NA,3,NA,NA,NA),project2 = c(NA,NA,NA,4,5,NA),project3 = c(NA,NA,NA,NA,NA,6),
subject1 = c("A","B",NA,NA,NA,NA),subject2 = c(NA,NA,"C","D","E",NA),subject3 = c(NA,NA,NA,NA,NA,"F"))
This data :
identifiant project1 project2 project3 subject1 subject2 subject3
1 I1 1 NA NA A <NA> <NA>
2 I2 NA NA NA B <NA> <NA>
3 I3 3 NA NA <NA> C <NA>
4 I4 NA 4 NA <NA> D <NA>
5 I5 NA 5 NA <NA> E <NA>
6 I6 NA NA 6 <NA> <NA> F
And I would like this data :
identifiant project subject
1 I1 1 A
2 I2 NA B
3 I3 3 C
4 I4 4 D
5 I5 5 E
6 I6 6 F
So a columns merging but according a 'data dictionary' :
dictionary <- data.frame(old = c("identifiant","project1", "project2", "project3","subject1", "subject2", "subject3"),
new = c("identifiant","project","project","project","subject","subject","subject"))
old new
1 identifiant identifiant
2 project1 project
3 project2 project
4 project3 project
5 subject1 subject
6 subject2 subject
7 subject3 subject
Does anyone have a solution without going into something very complicated?
Upvotes: 1
Views: 67
Reputation: 887048
Another option is coalesce
to apply on the two sets of columns ('project', 'subject')
library(dplyr)
df1 %>%
mutate_at(vars(starts_with('subject')), as.character)%>%
transmute(identifiant,
project = coalesce(!!! select(., starts_with('project'))),
subject = coalesce(!!! select(., starts_with('subject'))))
# identifiant project subject
#1 I1 1 A
#2 I2 2 B
#3 I3 3 C
#4 I4 4 D
#5 I5 5 E
#6 I6 6 F
Or if we need to use the 'dictionary' dataset, split it to a list
based on the 'new' values
library(purrr)
dictionary %>%
group_split(new, keep = FALSE) %>%
map_dfc(~ .x %>%
pull(old) %>%
as.character(.x) %>%
select(df1, .) %>%
mutate_all(as.character) %>%
transmute(new = coalesce(!!! .))) %>%
set_names(unique(dictionary$new))
Or if we use a named list
, we can make use of imap
in assignment :=
dictionary %>%
{split(as.character(.$old), .$new)} %>%
imap_dfc(~ select(df1, .x) %>%
mutate_all(as.character) %>%
transmute(!! .y := coalesce(!!! .)))
# identifiant project subject
#1 I1 1 A
#2 I2 2 B
#3 I3 3 C
#4 I4 4 D
#5 I5 5 E
#6 I6 6 F
Upvotes: 2
Reputation: 13309
With dplyr
, we could do:
df %>%
tidyr::pivot_longer(cols= contains("project"),
names_to="proj",
values_to = "vals") %>%
filter(!is.na(vals)) %>%
tidyr::pivot_longer(cols=contains("subject"),
names_to = "subj",
values_to = "other_val") %>%
na.omit() %>%
rename(project=vals, subject=other_val) %>%
select(-proj,-subj)
# A tibble: 6 x 3
identifiant project subject
<fct> <dbl> <fct>
1 I1 1 A
2 I2 2 B
3 I3 3 C
4 I4 4 D
5 I5 5 E
6 I6 6 F
Upvotes: 1