Schatzi121
Schatzi121

Reputation: 73

R Merging data.frame and concatenating values

I have two data frames that I would like to merge, with the values in each cell being pasted (with a space) together.

x <- letters[1:5]
x1 <- letters[7:11]
x.d <- data.frame(x,x1)
names(x.d) <- c("California","Nevada")
rownames(x.d) <- c("one","two","three","four","five")
x.d
      California Nevada
one            a      g
two            b      h
three          c      i
four           d      j
five           e      k
 
y <- letters[2:5]
y1 <- letters[12:15]
y.d <- data.frame(y,y1)
y.d[2,2] <- NA
names(y.d) <- c("Nevada","Texas")
rownames(y.d) <- c("one","two","four","five")
y.d
     Nevada Texas
one       b     l
two       c  <NA>
four      d     n
five      e     o
 
##Desired output "all"
a1 <- c("a","b","c","d","e")
a2 <- c("g b","h c","i","j d","k e")
a3 <- c("l","","","n","o")
all <- data.frame(a1,a2,a3)
names(all) <- c("California","Nevada","Texas")
rownames(all) <- c("one","two","three","four","five")
all
      California Nevada Texas
one            a    g b     l
two            b    h c      
three          c      i      
four           d    j d     n
five           e    k e     o

Where the result should look the variable "all". I was trying tidyverse and lapply functions, but I can't figure this out. Any ideas?

Upvotes: 0

Views: 127

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47300

We might use {powerjoin}

x.d <- data.frame(
  California = c("a", "b", "c", "d", "e"),
  Nevada = c("g", "h", "i", "j", "k"),
  row.names = c("one", "two", "three", "four", "five")
)

y.d <- data.frame(
  Nevada = c("b", "c", "d", "e"),
  Texas = c("l", NA, "n", "o"),
  row.names = c("one", "two", "four", "five")
)

library(powerjoin)
power_left_join(
  tibble::rownames_to_column(x.d, "id"),
  tibble::rownames_to_column(y.d, "id"),
  by = "id",
  conflict = paste_xy
)
#>      id California Texas Nevada
#> 1   one          a     l    g b
#> 2   two          b  <NA>    h c
#> 3 three          c  <NA>      i
#> 4  four          d     n    j d
#> 5  five          e     o    k e

Created on 2022-11-03 with reprex v2.0.2

Upvotes: 1

Emily
Emily

Reputation: 71

First you need a column to join by, so we'll take your row names and make them into a column for each data frame:

x.d <- tibble::rownames_to_column(x.d)
y.d <- rownames_to_column(y.d)
x.d
rowname California Nevada
    one          a      g
    two          b      h
  three          c      i
   four          d      j
   five          e      k

y.d
rowname Nevada Texas
    one      b     l
    two      c  <NA>
   four      d     n
   five      e     o

Now you can do your join by the rowname column, which will give you Nevada.x and Nevada.y. Then use tidyr::unite() to paste the strings from the two Nevada columns together, using the sep argument to get the space in between, remove = T to discard the original columns, and na.rm = T to discard NA before the merge. Finally just replace_na() to get rid of the NA in Texas and turn your rowname column back into row names.

x.d %>%
   full_join(y.d, by = "rowname") %>%
   unite("Nevada", Nevada.x, Nevada.y, sep = " ", remove = TRUE, na.rm = TRUE) %>%
   mutate(Texas = replace_na(Texas, "")) %>%
   column_to_rownames("rowname")

        California Nevada Texas
    one          a    g b     l
    two          b    h c      
  three          c      i      
   four          d    j d     n
   five          e    k e     o

Upvotes: 2

stefan
stefan

Reputation: 123783

Instead of a join using bind_rows + group_by + summarise you could do:

library(dplyr, warn.conflicts = FALSE)

x.d$id <- row.names(x.d)
y.d$id <- row.names(y.d)
 
bind_rows(x.d, y.d) %>%
  group_by(id) %>%
  summarise(across(everything(), ~paste(.x[!is.na(.x)], collapse = " ")))
#> # A tibble: 5 × 4
#>   id    California Nevada Texas
#>   <chr> <chr>      <chr>  <chr>
#> 1 five  e          k e    "o"  
#> 2 four  d          j d    "n"  
#> 3 one   a          g b    "l"  
#> 4 three c          i      ""   
#> 5 two   b          h c    ""

Upvotes: 2

Related Questions