Reputation: 73
I have two data frames that I would like to merge, with the values in each cell being pasted (with a space) together.
x <- letters[1:5]
x1 <- letters[7:11]
x.d <- data.frame(x,x1)
names(x.d) <- c("California","Nevada")
rownames(x.d) <- c("one","two","three","four","five")
x.d
California Nevada
one a g
two b h
three c i
four d j
five e k
y <- letters[2:5]
y1 <- letters[12:15]
y.d <- data.frame(y,y1)
y.d[2,2] <- NA
names(y.d) <- c("Nevada","Texas")
rownames(y.d) <- c("one","two","four","five")
y.d
Nevada Texas
one b l
two c <NA>
four d n
five e o
##Desired output "all"
a1 <- c("a","b","c","d","e")
a2 <- c("g b","h c","i","j d","k e")
a3 <- c("l","","","n","o")
all <- data.frame(a1,a2,a3)
names(all) <- c("California","Nevada","Texas")
rownames(all) <- c("one","two","three","four","five")
all
California Nevada Texas
one a g b l
two b h c
three c i
four d j d n
five e k e o
Where the result should look the variable "all". I was trying tidyverse and lapply functions, but I can't figure this out. Any ideas?
Upvotes: 0
Views: 127
Reputation: 47300
We might use {powerjoin}
x.d <- data.frame(
California = c("a", "b", "c", "d", "e"),
Nevada = c("g", "h", "i", "j", "k"),
row.names = c("one", "two", "three", "four", "five")
)
y.d <- data.frame(
Nevada = c("b", "c", "d", "e"),
Texas = c("l", NA, "n", "o"),
row.names = c("one", "two", "four", "five")
)
library(powerjoin)
power_left_join(
tibble::rownames_to_column(x.d, "id"),
tibble::rownames_to_column(y.d, "id"),
by = "id",
conflict = paste_xy
)
#> id California Texas Nevada
#> 1 one a l g b
#> 2 two b <NA> h c
#> 3 three c <NA> i
#> 4 four d n j d
#> 5 five e o k e
Created on 2022-11-03 with reprex v2.0.2
Upvotes: 1
Reputation: 71
First you need a column to join by, so we'll take your row names and make them into a column for each data frame:
x.d <- tibble::rownames_to_column(x.d)
y.d <- rownames_to_column(y.d)
x.d
rowname California Nevada
one a g
two b h
three c i
four d j
five e k
y.d
rowname Nevada Texas
one b l
two c <NA>
four d n
five e o
Now you can do your join by the rowname
column, which will give you Nevada.x
and Nevada.y
. Then use tidyr::unite()
to paste the strings from the two Nevada columns together, using the sep
argument to get the space in between, remove = T
to discard the original columns, and na.rm = T
to discard NA before the merge. Finally just replace_na()
to get rid of the NA in Texas and turn your rowname
column back into row names.
x.d %>%
full_join(y.d, by = "rowname") %>%
unite("Nevada", Nevada.x, Nevada.y, sep = " ", remove = TRUE, na.rm = TRUE) %>%
mutate(Texas = replace_na(Texas, "")) %>%
column_to_rownames("rowname")
California Nevada Texas
one a g b l
two b h c
three c i
four d j d n
five e k e o
Upvotes: 2
Reputation: 123783
Instead of a join using bind_rows
+ group_by
+ summarise
you could do:
library(dplyr, warn.conflicts = FALSE)
x.d$id <- row.names(x.d)
y.d$id <- row.names(y.d)
bind_rows(x.d, y.d) %>%
group_by(id) %>%
summarise(across(everything(), ~paste(.x[!is.na(.x)], collapse = " ")))
#> # A tibble: 5 × 4
#> id California Nevada Texas
#> <chr> <chr> <chr> <chr>
#> 1 five e k e "o"
#> 2 four d j d "n"
#> 3 one a g b "l"
#> 4 three c i ""
#> 5 two b h c ""
Upvotes: 2