Addison
Addison

Reputation: 143

Suffixes when merging more than two data frames with full_join

I would like to used nested full_join to merge several data frames together. In addition, I am hoping to be able to add suffixes to all of the columns so that when the data frames are merged each column name indicates which data frame it came from (e.g., a unique time identifier like T1, T2, ...).

x <- data.frame(i = c("a","b","c"), j = 1:3, h = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, h = 1:3, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, h = 1:3, stringsAsFactors=FALSE)

full_join(x, y, by='i') %>% left_join(., z, by='I')

Is there a way to integrate the default suffix option so that I get a dataset with column names that look like:

column_names <- c("i", "j_T1", "h_T1", "k_T2", "h_T2", "l_T3", "h_T3")

Upvotes: 1

Views: 1835

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389055

Put the dataframes in a list, add the dataframe name as suffix and perform the join.

library(dplyr)
library(purrr)

lst(x, y, z) %>%
  imap(function(x, y) x %>% rename_with(~paste(., y, sep = '_'), -i)) %>%
  reduce(full_join, by = 'i')

#  i j_x h_x k_y h_y l_z h_z
#1 a   1   1  NA  NA   9   3
#2 b   2   2   4   1  NA  NA
#3 c   3   3   5   2   7   1
#4 d  NA  NA   6   3   8   2

Upvotes: 3

Zo&#235; Turner
Zo&#235; Turner

Reputation: 499

I think this can be done by working with the column headers using purrr but I've used pivot_wider and pivot_longer to change the header names:

df <- x %>% 
  full_join(y, by = "i") %>% 
  full_join(z, by = "i") %>% 
  pivot_longer(cols = -i,
               names_to = "columns",
               values_to = "values") %>% # makes the column headers into a column 
which can be changed
  mutate(columns = str_replace(columns, ".x", "_T2"),
         columns = str_replace(columns, ".y", "_T3"),
         columns = case_when(!str_detect(columns, "T") ~ paste0(columns, "_T1"),
                             TRUE ~ columns)) %>% 
  pivot_wider(names_from = columns,
              values_from = values)

These don't match the listed headers but hopefully this code will help to get you started if the order is important and column l should be T3 (there was only 1 in this example).

Upvotes: 1

Related Questions